There are currently no items in your shopping cart.

User Panel

Forgot your password?.

Reinforcement Learning with Python Explained for Beginners

Video Introducing this tutorial

Introduction to Course and Instructor:
Introduction to Course and Instructor

Motivation Reinforcement Learning:
What is Reinforcement Learning
What is Reinforcement Learning Hiders and Seekers by OpenAI
RL Versus Other ML Frameworks
Why Reinforcement Learning
Examples of Reinforcement Learning
Limitations of Reinforcement Learning

Terminology of Reinforcement Learning:
What is Environment
What is Environment_2
What is Agent
What is State
State Belongs to Environment and not to Agent
What is Action
What is Reward

GridWorld Example:
Setup 1
Setup 2
Setup 3
Policy Comparison
Deterministic Environment
Stochastic Environment
Stochastic Environment 2
Stochastic Environment 3
Non-Stationary Environment
GridWorld Summary

Markov Decision Process Prerequisites:
Probability 2
Probability 3
Conditional Probability
Conditional Probability Fun Example
Joint Probability
Joint probability 2
Joint probability 3
Expected Value
Conditional Expectation
Modeling Uncertainty of Environment
Modeling Uncertainty of Environment 2
Modeling Uncertainty of Environment 3
Modeling Uncertainty of Environment Stochastic Policy
Modeling Uncertainty of Environment Stochastic Policy 2
Modeling Uncertainty of Environment Value Functions
Running Averages
Running Averages 2
Running Averages as Temporal Difference

Elements of Markov Decision Process:
Markov Property
State Space
Action Space
Transition Probabilities
Reward Function
Discount Factor

More on Reward:
MOR Quiz 1
MOR Quiz Solution 1
MOR Quiz 2
MOR Quiz Solution 2
MOR Reward Scaling
MOR Infinite Horizons
MOR Quiz 3
MOR Quiz Solution 3

Solving Markov DP:
MDP Recap
Value Functions
Optimal Value Function
Optimal Policy
Bellman Equation
Value Iteration
Value Iteration Quiz
Value Iteration Quiz Gamma Missing
Value Iteration Solution
Problems of Value Iteration
Policy Evaluation
Policy Evaluation 2
Policy Evaluation 3
Policy Evaluation d Form Solution
Policy Iteration
State Action Values
V and Q Comparisons

Value Approximation:
What Does it Mean that MDP is Unknown
Why Transition Probabilities are Important
Model-Based Solutions
Model-Free Solutions
Monte-Carlo Learning
Monte-Carlo Learning Example
Monte-Carlo Learning Limitations

Temporal Differencing - Q Learning:
Running Average
Learning Rate
Learning Equation
TD Algorithm
Exploration Versus Exploitation
Epsilon Greedy Policy
Q-Learning Implementation for MAPROVER Clipped

TD Lambda:
N-Step Look a Head
TD Q-Learning TD Lambda
TD Q-Learning TD Lambda TD(Lambda) MAPRover Activity

Project Frozenlake (Open AI Gym):
Frozenlake 1
Frozenlake Implementation