This course is offered in a blended format, with in-person and live virtual cohorts attending simultaneously. When registering, select the appropriate registration button below.

Coming Soon!
Lead Instructor(s)
TBD Summer 2023
On Campus or Live Virtual
Course Length
3 Days
Course Fee
Coming Soon!

Reinforcement learning (RL), is enabling exciting advancements in self-driving vehicles, natural language processing, automated supply chain management, financial investment software, and more. In this three-day course, you will acquire the theoretical frameworks and practical tools you need to use RL to solve big problems for your organization. 

This course may be taken individually or as part of the Professional Certificate Program in Machine Learning & Artificial Intelligence. COMPLETING THE COURSE WILL CONTRIBUTE 3 DAYS TOWARDS THE CERTIFICATE.

Course Overview

Understand if RL can solve the big problems of your organization. Acquire the theoretical framework and basic tools for implementing RL.

Join professionals from around the world to upgrade your machine learning (ML) toolkit in this three-day RL bootcamp. Through interactive lectures and hands-on exercises, you will (i) understand the difference between supervised learning and RL; (ii)  be able to gauge which problems in your organization can be solved using RL; (iii) gain a solid understanding of state-of-the-art Deep RL algorithms; (iv) ability to cast your favorite challenge into the RL framework and recognize the promise and limitations of RL through a hands-on-session and live RL clinic; (v) be able to reason about which RL algorithm is most appropriate for the problem at hand.
This program includes the unique opportunity to present your organization’s specific technological challenges to MIT faculty during a live RL Clinic—a session designed to help you identify if RL can be used to solve your problems, determine which approach will be most effective, and design RL applications to resolve the issue. During this process, you will draw on the expertise of the course teaching team, which is comprised of recognized industry experts with experience working at 12 firms across multiple industries, from both startups
and big tech.

COVID-19 Updates

We fully expect to resume on-campus Short Programs courses during the Summer of 2022. However, the possibility remains of ongoing disruption and restrictions due to COVID-19 which may require that the course be delivered via live virtual format. Please read more here.

Learning Outcomes
  • Understand the basic principles of RL and learn when RL can be applied to your business problem and how to pose the problem for obtaining maximum gains from RL both through lectures and an interactive group session. 
    • Learn when supervised learning is sufficient and when RL can provide a big advantage. 
  • Learn about Bandits, Contextual Bandits and the more general RL formulation. 
  • Understand the theory and the practical aspects of how to use popular Deep RL algorithms such as DQN, A3C, PPO, SAC, TD3, MCTS. 
  • Walk through application of RL algorithms and what made them work. 
  • Develop rules-of-thumb to reason about when to use which Deep RL Algorithm. 
  • Understand how to structure the observation, action space and the reward function for optimally training the RL agent. 
  • Learn about the limitations of Deep RL algorithm, how to tune hyperparameters and practical tricks.

Lead Instructors

Pulkit Agrawal

Pulkit Agrawal is the Steven and Renee Chair Assistant Professor of Electrical Engineering and Computer Science at MIT and leads the Improbable AI Lab, part of the Computer Science and Artificial Intelligence Lab at MIT and affiliated with the Laboratory for Information and Decision Systems. In the past, Dr. Agrawal has spent time at Deep Mind and Qualcomm and an advisor for Cavium Inc. He co-founded SafelyYou, an organization that builds fall prevention technology, and the AI Foundry, an incubator for AI startups. He currently serves as an advisor for several startups and has research collaborations with companies such as IBM, Toyota, Sony, Facebook AI Research (FAIR), etc. 


Cathy Wu

Cathy Wu is the Gilbert W. Winslow Career Development Assistant Professor of civil and environmental engineering at MIT and has worked across many fields and organizations, including Microsoft Research, OpenAI, the Google X Self-Driving Car Team, AT&T, Caltrans, Facebook, and Dropbox. Wu is also the founder and Chair of the Interdisciplinary Research Initiative at the ACM Future of Computing Academy.


Program Outline

Day 1 (9:00am - 7:30pm)

  • [9:00-9:30] Welcome: Meet & Greet
  • [9:30-11:00] Session 1 (1.5 hours): What is RL, why RL and basic RL
    • Introduction to decision making
    • What is and isn't RL? How is RL Different from Supervised Learning.
    • The central challenge in RL: Exploration v/s Exploitation
    • When to use RL?
    • Bandits and Contextual Bandits
  • [11:30-11:30] break (0.5 hours)
  • [11:30-12:30] Session 2: Modeling a Decision Problem and Introduction to Policy Gradients
    • Basic Terminology: markov decision process, what is an episode, etc.
    • Introduction to Policy Gradients: REINFORCE
    • Variance reduction with baselines, causality, Generalized Advantage Estimation
  • [12:30-13:30] Lunch (1 hour)
  • [13:30-14:00] Office Hours with Instructors or Socialize (30 mins)
  • [14:00-15:30] Session 3: State-of-the Art Policy Gradient Algorithms
    • Advantage Actor Critic (A2C)
    • Asynchronous Actor Critic (A3C)
    • Trust Region Policy Optimization (TRPO) Proximal Policy Gradients (PPO)
    • Hyperparameters and Tricks in Policy Gradients
      • How deep should my network be? What learning rate should I use? How should the network be initialized?
  • [15:30-16:00] break (0.5 hours)
  • [16:00-17:00] Session 4: Live demo & hands-on implementation
    • Setup an environment into a format amenable to RL algorithms
    • Hands-on Exercise on Policy Gradients
  • [17:00-18:15] Session 5: How to use RL Algorithms? Walk through some applications
    • Recommendation systems
    • Balloon Localization
    • Manipulation
    • Urban Planning
    • Introduction to the Problem Clinic
  • [18:15-19:30] Reception (1 hour)

Day 2 (9:30am - 7:00pm)

  • [9:30-11:00] Session 1: Value Based Reinforcement Learning
    • Why value based RL?
    • Connection between dynamic programming and RL
    • Policy Iteration
    • Value Iteration
    • Q-Learning
    • What is off-policy learning?
    • Deep Q-Learning (DQN)
      • Target Network, Replay Buffer
  • [11-11:30] break (0.5 hours)
  • [11:30-13:00] Session 2: Practical Considerations in Deep Q-Learning
    • Double Q-Learning (DDQN)
    • Prioritized Experience Replay
    • RAINBOW: Combining several improvements in Deep Q-Learning
    • Deep Q-Learning for Continuous Action Spaces
      • Deep Deterministic Policy Gradients (DDPG)
      • Soft-Actor Critic (SAC)
      • Twin Delayed DDPG (TD3)
  • [13:00-14:00] Lunch (1 hour)
  • [14:00-15:00] Live demo & hands-on implementation
    • Setup an environment into a format amenable to RL algorithms
    • Hands-on Exercise on Deep Q-Learning
  • [15:00-16:00] Group Session: Discuss and Formulate Problems into RL Framework
  • [16-16:30] Coffee Break
  • [16:30-17:30] Session 4: Practical Perspectives
    • The Reward Hacking Problem
    • What if my action space is large?
    • What is my RL problem is non-markov? How to design the state-space?
  • [17:30-18:00] Session 5: Safety and Ethics of RL
  • [18:00-19:00] Group Session: Work on Problem Clinic

Day 3 (9:30am - 7pm)

  • [9:30-11:30] Session 1 (2 hours): Problem Clinic Presentation & Discussion Part I
  • [11:30-12:00] Break
  • [12:00-13:00] Session 2 (1 hour): Problem Clinic Presentation & Discussion Part II
  • [13:00-14:00] Lunch (1 hours)
  • [14:00-15:30] Session 3: Discussion on Practice and Theory of RL
    • Theory ←→ Practice
    • Augmented Random Search
    • Asymptotic convergence, sample complexity, regret
    • Theoretical Foundations of DQN, policy gradients, bandit methods
  • [15:30-16:00] Session 4: AMA with Instructors
  • [16:00-16:30] break
  • [16:30-18:00] Session 5: Overview and Need of Advanced Topics
    • Discussion on limitation of RL techniques: Non-Stationarity, data inefficiency
    • Overview of Advanced Topics
      • Based on class interest, we will delve into one of the advanced topics.
    • More Applications
  • [18:00-19:00] Session 6: Office Hours or Dive into a topic of interest to the class


Who Should Attend

This program is ideally suited for technical professionals who wish to understand cutting-edge trends and advances in reinforcement learning. Professionals who are not sure of when and how to apply RL in engineering and business settings will find this program especially useful.

The curriculum is particularly appropriate for professionals with significant experience and demonstrated career progression, such as:

  • Engineers / Managers who want to understand Deep RL and its implications
  • Research scientists who want to improve their ability to utilize Deep RL algorithms
  • Machine learning engineers and software engineers looking to use RL to enhance results derived from supervised learning systems  
  • Data scientists who want to incorporate RL strategies into their machine learning toolkit
  • Data analysts and business analysts who are tasked with solving problems with limited quantities of data
  • Product managers and program managers who need to be able to identify when it is appropriate and effective to apply RL  
  • CTOs and other executives who want to identify how RL can be implemented to address organization-wide challenges


To be able to take full advantage of this program, we recommend that participants have a mathematical background in linear algebra and probability, basic knowledge of deep-learning, and experience with programming (preferably Python). This background will help participants follow some of the practical examples more effectively. There are two optional assignments in the program that will require a computer with Google CoLab that runs on any browser or Unix/Linux Terminal.