Course is closed
Lead Instructor(s)
Jun 01 - 05, 2020
Registration Deadline
Live Virtual
Course Length
5 Days
Course Fee
Sign-up for Course Updates


Optimization algorithms lie at the heart of machine learning (ML) and artificial intelligence (AI). The distinctive feature of optimization within ML is the strong departure from textbook approaches: the focus is now on a different set of goals driven by big data, non-convex deep learning, and high-dimensions. This departure and the different focus make it challenging for newcomers and even experienced users to obtain a solid grasp of the fundamental ideas without getting lost in myriad tutorials, blogs, and papers. 

This course provides an accessible entry point to Modeling and Optimization for Machine Learning, key skills needed to use state-of-the-art software and algorithms from machine learning. It covers underlying theoretical motivations behind widely-used optimization algorithms (the “science”), while diving deep into aspects of mathematical modeling (the “art”) to provide students with an intuitive, foundational introduction to this modern and fast-moving research area.

Modeling reduces messy engineering or computational problems to mathematical forms that can be solved by using standard software and techniques. By recognizing mathematical patterns “in the wild,” participants will develop an intuition for which problems are solvable using standard numerical modeling techniques and gain the knowledge and skills to then solve them in practice.

After we develop an appropriate model for a machine learning problem, the next step is to choose an optimization technique. Participants in the course will learn to pair mathematical models with efficient optimization algorithms, from stochastic gradient descent to cone programming. Participants will delve into the details of how popular optimization methods work and will receive practical experience interfacing with optimization software through case studies and exercises.

By the end of the course, participants will learn how to boil real-world challenges down to their computational essence to make a reasonable estimate of how difficult it would be to design a numerical method to solve them. We will cover a breadth of tools, from numerical linear algebra to convex programming and stochastic/deterministic gradient descent, in the context of practical problems drawn from emerging applications in learning, computer vision, time series analysis, and imaging. Coding and mathematical exercises will reinforce these ideas and expose participants to standard software packages for optimization.

Participant Takeaways

Participants in the course will learn how to:

  • Recognize classes of optimization problems in machine learning and related disciplines.
  • Learn concepts that demystify the “why” and “how” of ubiquitous topics such as regression, deep learning, and large-scale optimization, with a focus on convex and non-convex models.
  • Interface with software for computing optimal solutions to a given machine learning problem.
  • Understand the mathematical underpinnings of optimization methods via examples drawn from machine learning, computer vision, engineering, and data analysis.
  • Understand foundational optimization ideas including gradient descent, stochastic gradient methods, higher-order methods, and more advanced optimization algorithms.
  • Classify optimization problems by their tractability, difficulty, and compatibility with existing software.
  • Learn to cut through the hype to make more informed choices for their own applications.

Who Should Attend

This course is designed for people working in data science, finance, marketing, computer-aided design, operations, strategy, engineering, research, or computer vision. Typical roles include engineer, programmer, developer, data scientist, researcher, consultant, or marketing analyst.


Participants are required to have a background in linear algebra and multivariable calculus, as well as at least basic programming in Python.

Laptops (or tablets) with Python are required for this course. Participants should have administrative privileges for their computers in case Python packages need to be installed during the course.

Program Outline

The course begins with the fundamentals of modeling and optimization, including case studies converting regression and classification problems to mathematical models as well as the basics of deterministic and stochastic gradient descent. We then broaden the capabilities of our modeling language by showing how to incorporate constraints and accelerate optimization with second-order information. After establishing the basics, we consider a variety of more advanced models in machine learning, including neural network training, sparsity and low-rank regularization, metric learning, time-series analysis, and adversarial training of robust models. We conclude with practical discussion drawn from research projects at MIT as well as from participants’ domain areas.

Timing (all times EDT):
Monday (6/1/20)

  • 9:00 Introductions and troubleshooting
  • 11:00 Discussion and coffee (15 min)
  • 11:15 Basic notions:  Modeling --- variables, criteria, constraints
  • 12:15 Lunch break
  • 13:15 Gradient descent, stochastic gradient descent
  • 14:15 Discussion and coffee (30 min)
  • 14:45 Intro to practicum: Modeling and optimization for least-squares
  • 15:30 Practicum [comparing Google Sheets to SGD in Python]
  • 17:00 END 

Tuesday (6/2)

  • 9:30 Second-order methods (Newton-type methods, quasi-Newton)
  • 10:30 Discussion and coffee (30 min)
  • 11:00 Neural networks models
  • 12:00 Lunch break
  • 13:15 Working with constraints in optimization
  • 14:15 Discussion and coffee (30 min)
  • 14:45 Case study 3:  optimal transport
  • 15:45 Practicum:  CVX and other solvers
  • 17:00 END 

Wednesday (6/3)

  • 9:30 Sparsity, low-rank optimization, smoothness, and other considerations
  • 10:30 Discussion and coffee
  • 11:00 Case study 4: Nonlinear image analysis --- add translating to a solver
  • 12:00 Lunch break
  • 13:00 Advanced models:  GANs, adversarial optimization, robust optimization, cycle consistency
  • 14:00 Discussion and coffee
  • 14:30 Case study 5:  Constructing adversarial examples
  • 15:30 Practicum
  • 17:00 END 

Thursday (6/4)

  • 9:30 Metric learning motivation, models, and optimization
  • 10:30 Discussion and coffee (30 mins)
  • 11:00 Classification models (NN based) with a “reject” option
  • 12:00 Lunch break
  • 13:00 Industrial time-series case study (modeling, and optimization)
  • 14:00 Discussion and coffee (30 mins)
  • 14:30 Image sharpening: model and optimization
  • 15:30 Practicum:  Implement 2nd order trend filtering OR implement Metric Learning and try kNN
  • 17:00 END

Friday (6/5)

  • 9:30 Interaction of optimization with neural network architecture
  • 10:30 Discussion and coffee
  • 11:00 Case study 7: Clustering, embedding, and visualization
  • 12:00 Lunch break
  • 13:00 Optimization and modeling project discussion
  • 14:00 Discussion and coffee
  • 14:15 Practical guide to OPTML (S+J)
  • 16:00 Course Closing

Links & Resources


The type of content you will learn in this course, whether it's a foundational understanding of the subject, the hottest trends and developments in the field, or suggested practical applications for industry.

Fundamentals: Core concepts, understandings, and tools - 30%|Latest Developments: Recent advances and future trends - 42%|Industry Applications: Linking theory and real-world - 28%
Delivery Methods

How the course is taught, from traditional classroom lectures and riveting discussions to group projects to engaging and interactive simulations and exercises with your peers.

Lecture: Delivery of material in a lecture format - 50%|Discussions or Group Work: Participatory learning - 30%|Labs: Demonstrations, experiments, simulations - 20%

What level of expertise and familiarity the material in this course assumes you have. The greater the amount of introductory material taught in the course, the less you will need to be familiar with when you attend.

Introductory: Appropriate for a general audience - 75%|Advanced: In-depth exploration at the graduate level - 25%