Data Prep for Machine Learning in Python
Learn all the essential skills to prep your data in Python, from data cleaning and imputation to EDA, feature selection, and feature engineering.
Introduction to Data Prep for Machine Learning
Pre-requisite Knowledge
A Quick Guide to Course Structure, Notebooks, and Exercises
Chapter Intro - Importing & Cleaning Data
Importing Data - CSV, Excel, and SQL
Selecting Columns
Filtering Rows
Exercise - Import & Filter Data
Exercise Review - Import & Filter Data
Data Types Theory
Basic Data Validation
Comparing to a Trusted Datasource
Exercise - Data Validation
Exercise Review - Data Validation
Imputation Theory
Cleaning Data
Data Type Errors
Imputation with Zeros
Basic Imputation of Values
Exercise - Cleaning & Imputation
Exercise Review - Cleaning & Imputation
Chapter Introduction - EDA
Descriptive Stats for Numeric Features
Basic Plots for Numeric Features + Combining Axis & Functions
Basic Plots for Categorical Features
Basic Plots for Categorical Features
Exercise Review - Visuals for Numeric & Categoric Features
Continuous vs Continous Variable Analysis 1
Continuous vs Continous Variable Analysis 2
Categorical vs Continous Variable Analysis
Categorical vs Categorical Variable Analysis
Exercise Review - Creating and Analyzing Multivariate Plots
Chapter Intro - Feature Engineering
Training Vs Testing in Feature Engineering
Encoding Theory (inc One Hot Encoding)
Identifying Categorical Columns & Values
One Hot Encoding in Pandas
One Hot Encoding in SKLearn
Exercise - One Hot Encoding
Exercise Review - One Hot Encoding
Exercise Review On Hot Encoding Pt 2
GetDummies vs OneHotEncoder
Transforming Distributions Theory
Identifying Skew in Python
Transforming Features in Python
Taking Logs Scenarios
Exercise - Transformations
Exercise Review - Transformations
Outliers Theory
Removing Outliers
Modifying Outliers
Exercise - Outliers
Exercise Review - Outliers
Binning Theory
Categorical Binning
Binning by Width & Frequency
Manual Binning
Final Thoughts on Binning
Smoothing
Smoothing in Practice
Exercise - Binning
Exercise Review - Binning
Final Thoughts on Binning
Why Feature Scaling Matters
Scaling Features Theory
Min Max Scaling
Scaling Testing Data
Standard Scaler
Final Thoughts on Scaling
Exercise - Scaling
Exercise Review - Scaling
Making Feature Engineering Decisions
Chapter Intro - Feature Selection
Manual Feature Selection
Feature Selection with Continuous Target
Correlation Coefficients - Continuous Var + Continuous Feature
ANOVA - Continuous Target + Categorical Feature
Feature Selection with Categorical Target Variable
Box Plots - Categorical Var + Continous Feature
Chi-square - Categorical Var + Categorical Feature
Summary of Feature Selection Techniques
Course Conclusion
Qualified Assessment