9. Get the dataset

GETTING THE DATASET

Activity

  • Now as you can probably tell, the courses are quite large as it got over 200 tutorials, and over 35 hours of content.

  • As you imagine with that much training, we will have to have lots and lots of datasets and thats why we have decided to place all of the data centers on their own seperate page.

Downloading the Dataset and Template Folder

  • So in order to get the datasets, you will need to go to this website Super Data Science

  • And here you will see a whole page dedicated to this course with lots and lots of data sets available.

  • Which you can download and install onto your machine in order to follow along with the tutorials

  • So today, we are going to start off with the very first one as an example, and then throughout the course, you will be able to download the right data set for every single session.

  • So today we're going to start off with the data pre-processing data set.

  • We also can get the Machine Learning A-Z Template Folder

  • This is a special template folder that we've ceated for you to help you store these data sets in a hierarchical fashion

  • This folder is done up in this manner so that you can navigate all of these datasets better and so that they are all in the right place.

  • So go ahead and download the two zip files for Machine Learning A-Z Template Folder and Data_Preprocessing.zip

  • Then whenever you get to a new section, we will of course remind you at the start of the section to download the right dataset.

  • So unzip the two files to the location you want it to be for easier access later

  • Remember to unzip the files of Data_Preprocessing.zip into Part 1 - Data Preprocessing inside the Machine Learning source folder, inside Section 2 --- Part 1 Data Preprocessing folder.

  • So if we want to update any datasets, we would straight away upload that one zip file on the website superdatascience.

Opening Data.csv

  • When you open the Data.csv, there's Country, Age, Salary, and Purchased for the headers.

  • The dataset contains the information of some customers like the country, the age, the salary and whether the customer purchased the products of the company.

  • There are independent and dependent variables inside the dataset.

  • The first three columns, Country, Age, and the Salary are the independent variables.

  • The dependent variable is the Purchased variable.

  • The fourth column and in any machine learning model we are going to use some independent variables to predict a dependent variable.

  • So in this case, we are going to use the first 3 variables to predict whether the customer purchased a product or not.

Last updated