# The Data Science Course 2018: Complete Data Science Bootcamp

**Part 1: Introduction : **

A Practical Example: What You Will Learn in This Course

What Does the Course Cover **The Field of Data Science - The Various Data Science Disciplines : **

Data Science and Business Buzzwords: Why are there so many?

What is the difference between Analysis and Analytics

Business Analytics, Data Analytics, and Data Science: An Introduction

Continuing with BI, ML, and AI

A Breakdown of our Data Science Infographic

1 question **The Field of Data Science - Connecting the Data Science Disciplines : **

Applying Traditional Data, Big Data, BI, Traditional Data Science and ML

1 question **The Field of Data Science - The Benefits of Each Discipline : **

The Reason behind these Disciplines

1 question **The Field of Data Science - Popular Data Science Techniques : **

Techniques for Working with Traditional Data

Real Life Examples of Traditional Data

Techniques for Working with Big Data

Real Life Examples of Big Data

Business Intelligence (BI) Techniques

Real Life Examples of Business Intelligence (BI)

Techniques for Working with Traditional Methods

Real Life Examples of Traditional Methods

Machine Learning (ML) Techniques

Types of Machine Learning

Real Life Examples of Machine Learning (ML)

5 questions **The Field of Data Science - Popular Data Science Tools : **

Necessary Programming Languages and Software Used in Data Science

4 questions **The Field of Data Science - Careers in Data Science : **

Finding the Job - What to Expect and What to Look for

1 question **The Field of Data Science - Debunking Common Misconceptions : **

Debunking Common Misconceptions

1 question **Part 2: Statistics : **

Population and Sample

2 questions **Statistics - Descriptive Statistics : **

Types of Data

Levels of Measurement

Categorical Variables - Visualization Techniques

Categorical Variables Exercise

Numerical Variables - Frequency Distribution Table

Numerical Variables Exercise

The Histogram

Histogram Exercise

Cross Tables and Scatter Plots

Cross Tables and Scatter Plots Exercise

Mean, median and mode

Mean, Median and Mode Exercise

Skewness

Skewness Exercise

Variance

Variance Exercise

Standard Deviation and Coefficient of Variation

Standard Deviation

Standard Deviation and Coefficient of Variation Exercise

Covariance

Covariance Exercise

Correlation Coefficient

Correlation

Correlation Coefficient Exercise

Statistics - Practical Example: Descriptive Statistics

Practical Example: Descriptive Statistics

Practical Example: Descriptive Statistics Exercise

Statistics - Inferential Statistics Fundamentals

Introduction

What is a Distribution

The Normal Distribution

The Standard Normal Distribution

The Standard Normal Distribution Exercise

Central Limit Theorem

Standard error

Estimators and Estimates

Statistics - Inferential Statistics: Confidence Intervals

What are Confidence Intervals?

Confidence Intervals; Population Variance Known; z-score

Confidence Intervals; Population Variance Known; z-score; Exercise

Confidence Interval Clarifications

Student's T Distribution

Confidence Intervals; Population Variance Unknown; t-score

Confidence Intervals; Population Variance Unknown; t-score; Exercise

Margin of Error

Confidence intervals. Two means. Dependent samples

Confidence intervals. Two means. Dependent samples Exercise

Confidence intervals. Two means. Independent samples (Part 1)

Confidence intervals. Two means. Independent samples (Part 1) Exercise

Confidence intervals. Two means. Independent samples (Part 2)

Confidence intervals. Two means. Independent samples (Part 2) Exercise

Confidence intervals. Two means. Independent samples (Part 3)

Statistics - Practical Example: Inferential Statistics

Practical Example: Inferential Statistics

Practical Example: Inferential Statistics Exercise

Statistics - Hypothesis Testing

Null vs Alternative Hypothesis

Further Reading on Null and Alternative Hypothesis

Rejection Region and Significance Level

Type I Error and Type II Error

Test for the Mean. Population Variance Known

Test for the Mean. Population Variance Known Exercise

p-value

Test for the Mean. Population Variance Unknown

Test for the Mean. Population Variance Unknown Exercise

Test for the Mean. Dependent Samples

Test for the Mean. Dependent Samples Exercise

Test for the mean. Independent samples (Part 1)

Test for the mean. Independent samples (Part 1). Exercise

Test for the mean. Independent samples (Part 2)

Test for the mean. Independent samples (Part 2) Exercise

Statistics - Practical Example: Hypothesis Testing

Practical Example: Hypothesis Testing

Practical Example: Hypothesis Testing Exercise

Part 3: Introduction to Python

Introduction to Programming

Why Python?

Why Jupyter?

Installing Python and Jupyter

Understanding Jupyter's Interface - the Notebook Dashboard

Prerequisites for Coding in the Jupyter Notebooks

Jupyter's Interface

Python - Variables and Data Types

Variables

Numbers and Boolean Values in Python

Python Strings

Python - Basic Python Syntax

Using Arithmetic Operators in Python

The Double Equality Sign

How to Reassign Values

Add Comments

Understanding Line Continuation

Indexing Elements

Structuring with Indentation

Python - Other Python Operators

Comparison Operators

Logical and Identity Operators

Python - Conditional Statements

The IF Statement

The ELSE Statement

The ELIF Statement

A Note on Boolean Values

Python - Python Functions

Defining a Function in Python

How to Create a Function with a Parameter

Defining a Function in Python - Part II

How to Use a Function within a Function

Conditional Statements and Functions

Functions Containing a Few Arguments

Built-in Functions in Python

Python Functions

Python - Sequences

Lists

Using Methods

List Slicing

Tuples

Dictionaries

Python - Iterations

For Loops

While Loops and Incrementing

Lists with the range() Function

Conditional Statements and Loops

Conditional Statements, Functions, and Loops

How to Iterate over Dictionaries

Python - Advanced Python Tools

Object Oriented Programming

Modules and Packages

What is the Standard Library?

Importing Modules in Python

Part 4: Advanced Statistical Methods in Python

Introduction to Regression Analysis

Advanced Statistical Methods - Linear regression

The Linear Regression Model

Correlation vs Regression

Geometrical Representation of the Linear Regression Model

Python Packages Installation

First Regression in Python

First Regression in Python Exercise

Using Seaborn for Graphs

How to Interpret the Regression Table

Decomposition of Variability

What is the OLS?

R-Squared

Advanced Statistical Methods - Multiple Linear Regression

Multiple Linear Regression

Adjusted R-Squared

Multiple Linear Regression Exercise

Test for Significance of the Model (F-Test)

OLS Assumptions

A1: Linearity

A2: No Endogeneity

A3: Normality and Homoscedasticity

A4: No Autocorrelation

A5: No Multicollinearity

Dealing with Categorical Data - Dummy Variables

Making Predictions with the Linear Regression

Advanced Statistical Methods - Logistic Regression

Introduction to Logistic Regression

A Simple Example in Python

Logistic vs Logit Function

Building a Logistic Regression

Building a Logistic Regression - Exercise

An Invaluable Coding Tip

Understanding Logistic Regression Tables

Understanding Logistic Regression Tables - Exercise

What do the Odds Actually Mean

Binary Predictors in a Logistic Regression

Binary Predictors in a Logistic Regression - Exercise

Calculating the Accuracy of the Model

Underfitting and Overfitting

Testing the Model

Testing the Model - Exercise

Advanced Statistical Methods - Cluster Analysis

Introduction to Cluster Analysis

Some Examples of Clusters

Difference between Classification and Clustering

Math Prerequisites

Advanced Statistical Methods - K-Means Clustering

K-Means Clustering

A Simple Example of Clustering

A Simple Example of Clustering - Exercise

Clustering Categorical Data

How to Choose the Number of Clusters

How to Choose the Number of Clusters - Exercise

Pros and Cons of K-Means Clustering

To Standardize or not to Standardize

Relationship between Clustering and Regression

Market Segmentation with Cluster Analysis (Part 1)

Market Segmentation with Cluster Analysis (Part 2)

How is Clustering Useful?

EXERCISE: Species Segmentation with Cluster Analysis (Part 1)

EXERCISE: Species Segmentation with Cluster Analysis (Part 2)

Advanced Statistical Methods - Other Types of Clustering

Types of Clustering

Dendrogram

Heatmaps

Part 5: Mathematics

What is a matrix?

Scalars and Vectors

Linear Algebra and Geometry

Arrays in Python - A Convenient Way To Represent Matrices

What is a Tensor?

Addition and Subtraction of Matrices

Errors when Adding Matrices

Transpose of a Matrix

Dot Product

Dot Product of Matrices

Why is Linear Algebra Useful?

Part 6: Deep Learning

What to Expect from this Part?

What is Machine Learning

Deep Learning - Introduction to Neural Networks

Introduction to Neural Networks

Training the Model

Types of Machine Learning

The Linear Model (Linear Algebraic Version)

The Linear Model with Multiple Inputs

The Linear model with Multiple Inputs and Multiple Outputs

Graphical Representation of Simple Neural Networks

What is the Objective Function?

Common Objective Functions: L2-norm Loss

Common Objective Functions: Cross-Entropy Loss

Optimization Algorithm: 1-Parameter Gradient Descent

Optimization Algorithm: n-Parameter Gradient Descent

Deep Learning - How to Build a Neural Network from Scratch with NumPy

Basic NN Example (Part 1)

Basic NN Example (Part 2)

Basic NN Example (Part 3)

Basic NN Example (Part 4)

Basic NN Example Exercises

Deep Learning - TensorFlow: Introduction

How to Install TensorFlow

A Note on Installing Packages in Anaconda

TensorFlow Outline and Logic

Actual Introduction to TensorFlow

Types of File Formats, supporting Tensors

Basic NN Example with TF: Inputs, Outputs, Targets, Weights, Biases

Basic NN Example with TF: Loss Function and Gradient Descent

Basic NN Example with TF: Model Output

Basic NN Example with TF Exercises

Deep Learning - Digging Deeper into NNs: Introducing Deep Neural Networks

What is a Layer?

What is a Deep Net?

Digging into a Deep Net

Non-Linearities and their Purpose

Activation Functions

Activation Functions: Softmax Activation

Backpropagation

Backpropagation picture

Backpropagation - A Peek into the Mathematics of Optimization

Deep Learning - Overfitting

What is Overfitting?

Underfitting and Overfitting for Classification

What is Validation?

Training, Validation, and Test Datasets

N-Fold Cross Validation

Early Stopping or When to Stop Training

Deep Learning - Initialization

What is Initialization?

Types of Simple Initializations

State-of-the-Art Method - (Xavier) Glorot Initialization

Deep Learning - Digging into Gradient Descent and Learning Rate Schedules

Stochastic Gradient Descent

Problems with Gradient Descent

Momentum

Learning Rate Schedules, or How to Choose the Optimal Learning Rate

Learning Rate Schedules Visualized

Adaptive Learning Rate Schedules (AdaGrad and RMSprop )

Adam (Adaptive Moment Estimation)

Deep Learning - Preprocessing

Preprocessing Introduction

Types of Basic Preprocessing

Standardization

Preprocessing Categorical Data

Binary and One-Hot Encoding

Deep Learning - Classifying on the MNIST Dataset

MNIST: What is the MNIST Dataset?

MNIST: How to Tackle the MNIST

MNIST: Relevant Packages

MNIST: Model Outline

MNIST: Loss and Optimization Algorithm

Calculating the Accuracy of the Model

MNIST: Batching and Early Stopping

MNIST: Learning

MNIST: Results and Testing

MNIST: Exercises

MNIST: Solutions

Deep Learning - Business Case Example

Business Case: Getting acquainted with the dataset

Business Case: Outlining the Solution

The Importance of Working with a Balanced Dataset

Business Case: Preprocessing

Business Case: Preprocessing Exercise

Creating a Data Provider

Business Case: Model Outline

Business Case: Optimization

Business Case: Interpretation

Business Case: Testing the Model

Business Case: A Comment on the Homework

Business Case: Final Exercise

Deep Learning - Conclusion

Summary on What You've Learned

What's Further out there in terms of Machine Learning

An overview of CNNs

DeepMind and Deep Learning

An Overview of RNNs

An Overview of non-NN Approaches

Download All Resources

Software Integration

What are Data, Servers, Clients, Requests, and Responses

What are Data Connectivity, APIs, and Endpoints?

Taking a Closer Look at APIs

Communication between Software Products through Text Files

Software Integration - Explained

Case Study - What's Next in the Course?

Game Plan for this Python, SQL, and Tableau Business Exercise

The Business Task

Introducing the Data Set

Case Study - Preprocessing the 'Absenteeism_data'

What to Expect from the Following Sections?

Importing the Absenteeism Data in Python

Checking the Content of the Data Set

Introduction to Terms with Multiple Meanings

What's Regression Analysis - a Quick Refresher

Using a Statistical Approach towards the Solution to the Exercise

Dropping a Column from a DataFrame in Python

EXERCISE - Dropping a Column from a DataFrame in Python

SOLUTION - Dropping a Column from a DataFrame in Python

Analyzing the Reasons for Absence

Obtaining Dummies from a Single Feature

EXERCISE - Obtaining Dummies from a Single Feature

SOLUTION - Obtaining Dummies from a Single Feature

Dropping a Dummy Variable from the Data Set

More on Dummy Variables: A Statistical Perspective

Classifying the Various Reasons for Absence

Using .concatenate() in Python

EXERCISE - Using .concatenate() in Python

SOLUTION - Using .concatenate() in Python

Reordering Columns in a Pandas DataFrame in Python

EXERCISE - Reordering Columns in a Pandas DataFrame in Python

SOLUTION - Reordering Columns in a Pandas DataFrame in Python

Creating Checkpoints while Coding in Jupyter

EXERCISE - Creating Checkpoints while Coding in Jupyter

SOLUTION - Creating Checkpoints while Coding in Jupyter

Analyzing the Dates from the Initial Data Set

Extracting the Month Value from the "Date" Column

Extracting the Day of the Week from the "Date" Column

EXERCISE - Removing the "Date" Column

Analyzing Several "Straightforward" Columns for this Exercise

Working on "Education", "Children", and "Pets"

Final Remarks of this Section

Case Study - Applying Machine Learning to Create the 'absenteeism_module'

Exploring the Problem with a Machine Learning Mindset

Creating the Targets for the Logistic Regression

Selecting the Inputs for the Logistic Regression

Standardizing the Data

Splitting the Data for Training and Testing

Fitting the Model and Assessing its Accuracy

Creating a Summary Table with the Coefficients and Intercept

Interpreting the Coefficients for Our Problem

Standardizing only the Numerical Variables (Creating a Custom Scaler)

Interpreting the Coefficients of the Logistic Regression

Backward Elimination or How to Simplify Your Model

Testing the Model We Created

Saving the Model and Preparing it for Deployment

ARTICLE - A Note on 'pickling'

EXERCISE - Saving the Model (and Scaler)

Preparing the Deployment of the Model through a Module

Case Study - Loading the 'absenteeism_module'

Are You Sure You're All Set?

Deploying the 'absenteeism_module' - Part I

Deploying the 'absenteeism_module' - Part II

Exporting the Obtained Data Set as a *.csv

Case Study - Analyzing the Predicted Outputs in Tableau

EXERCISE - Age vs Probability

Analyzing Age vs Probability in Tableau

EXERCISE - Reasons vs Probability

Analyzing Reasons vs Probability in Tableau

EXERCISE - Transportation Expense vs Probability

Analyzing Transportation Expense vs Probability in Tableau