COVID-19 Updates: We fully expect to resume on-campus Short Programs courses during the Summer of 2022. However, the possibility remains of ongoing disruption and restrictions due to COVID-19 which may require that the course be delivered via live virtual format. Please read more here.

Register Now
Lead Instructor(s)
Jul 18 - 22, 2022
Registration Deadline
On Campus
Course Length
5 Days
Course Fee
3.3 CEUs
Sign-up for Course Updates

Enhance your knowledge of the quantitative and computational realms of data science through the lens of regression analysis. Over the course of five days, you’ll learn to maximize the power of your advanced computing methods and identify strategies for fitting your data to models. Alongside global peers, you’ll gain a deeper understanding of the underlying mathematical models that form the basis of data science—and learn which models work best in different circumstances. 

Course Overview

THIS COURSE MAY BE TAKEN INDIVIDUALLY OR as part of the professional certificate program in machine learning & artificial intelligence or THE PROFESSIONAL CERTIFICATE PROGRAM IN BIOTECHNOLOGY & LIFE SCIENCES.

This course aims to teach a suite of algorithms and concepts to a diverse set of participants interested in the general concept of fitting data to models. It starts with mostly simple linear algebra and computational methods, and introduces some more difficult mathematical concepts towards the end. This method also, by design, fits in with our approach of morning lectures and afternoon practice on personal computers. The combined teaching system provides opportunities for much hands-on learning and participants leave the course with practical knowledge of the basic algorithms.

Applied data science enjoys widespread application in nearly every industry today. It lies at the intersection of the quantitative (statistics and optimization), the computational (programming and IT), and the domanial (business knowledge).

Unfortunately, Data Science Education presently seems to overemphasize nonparametric methods, like Artificial Intelligence and Machine Learning (more on this below), perhaps because such methods seem irresistibly powerful and arcane.

AI and ML certainly can be powerful, but only the practitioner with a firm grasp of the fundamentals of data and models can leverage such methods with a sure hand, knowing when and how to use them and, importantly, when not to.

The present class is such a foundational course in data and models. Through the lens of regression analysis, a far-reaching discipline with roots in mathematics, statistics, and optimization, Foundations of Data and Models introduces students to the quantitative and (to a lesser degree) computational realms of data science.

One way to unpack the field of data and models is to bisect it into two general categories:

Parametric Methods
In these approaches, a reasonable mathematical model of the system under consideration is proposed, and the “regression task” is to determine the parameters of that mathematical model as uniquely and as accurately as possible. Typical methods for doing so include Least Squares, Simulated Annealing, Genetic Algorithms, Quasi-Random and Grid Search, etc.

Nonparametric Methods
Here no mathematical model is assumed, though this doesn’t mean a mathematical model is not used. Rather, the mathematical model is usually baked into the method and is sufficiently general that it can mimic a wide variety of possibly very nonlinear behaviors arising from the system under consideration. Typical methods are the Principal Components, Neural Networks or Deep Learning, Artificial Intelligence (AI), Machine Learning (ML), etc.

There are pros and cons to each method.

Foundations of Data and Models is a course in parametric methods of regression, which provides the student an intuitive understanding of the quantitative principles needed to excel when their data science journey arrives at AI and ML.

The course covers such topics as Statistics, Least Squares, Bacchus-Gilbert Methods, Simple Random, and Grid Search algorithms, Annealing and Genetic Algorithms, Errors in Nonlinear Regression, Solving Large Systems, Robust Regression with Regularization, Neural Networks and an Introduction to Artificial Intelligence.

We strongly suggest that students take this (or a similar) course before embarking into the world of Artificial Intelligence and Machine Learning.


Laptops for which you have administrative privileges are required for this course. PCs are recommended. Tablets will not be sufficient for the computing activities performed in this course.

Participants are encouraged to study a basic text prior to attendance. Two suggestions are:

  • Data Reduction and Error Analysis for the Physical Sciences, P. R. Bevington and D. K. Robinson, McGraw-Hill, Inc., 2nd ed., 1992.
  • Applied Regression Analysis, N. R. Draper and H. Smith, John Wiley and Sons, Inc., 2nd ed., 1981.

Participant Takeaways

  • Examining how to fit data to models
  • Defining linear least squares, non-linear least squares, singular value decomposition, sensitivity analysis, experiment design, and parameter error estimation
  • Appreciating grid search, random search, simulated annealing, genetic algorithms, neural networks, and large inverse systems
  • Investigating principles leading to rapid application of methods
  • Evaluating the results of pre-programmed computer exercises

Who Should Attend

This course is ideal for anyone who fits data to models. This course is truly broad-based and participants from vastly differing fields are envisioned and encouraged to attend. Some of these fields are engineering, business, natural sciences, geoscience, medicine, statistics, and economics.

Familiarity with computing and statistics is desirable. A fair background in linear algebra is highly recommended. The course is a condensed version of a regular MIT class with the same title, taught by Professor Morgan. The course has also been given at NASA, the University of the West Indies in Barbados, Sakarya University in Turkey, Stanford University, University of Science and Technology of China,the Cyprus Institute, and Texas A&M University.

Recent and past participants in this course have come from: Air Force Office of Scientific Research (AFOSR), Amgen Inc., AT&T, BAE Systems, Bank of America, Boeing, Boehringer Ingelheim Pharmaceuticals, BP America, Cisco Systems, Cox Communications, Delphi, Draper Laboratory, Dupont, EMC,  Environmental Protection Agency, ExxonMobil Chemical, General Motors, Hitachi (Japan), Intel, Johnson & Johnson, Korea Power Co., Kraft Foods, Los Alamos Labs, Mathworks, Mayo Clinic, Merck & Co Inc, Merrill Lynch, Motorola, Naval Research Laboratory, New York University, NTT (Japan), Nokia Research Center, Phillips Exeter Academy, Philips North America, Pioneer Investments, Polaroid Corporation, Salesforce, Sandia National Labs, Saudi Arabian Monetary Agency, Toshiba Corporation, University of Pennsylvania, University of West Indies, the U.S. Air Force, and Verizon Wireless.

Program Outline

Class runs 9:00 am to 5:00 pm Monday-Friday.

Daily Schedule:
9:00 am - 12:00 pm - Lecture
12:00 pm - 1:00 pm - Lunch Break 
1:00 pm - 5:00 pm - Lab Exercises

The format of each day is generally the same: mornings are devoted to lectures while participants spend the afternoons running pre-programmed software based on the morning lectures. During the afternoons, we stop the class often to have a discussion of progress and to give helpful tips and suggestions. Participants can work singly or in pairs at the computer.

Individual lectures will address the following topics:

  • Philosophy of Data and Models
  • Statistics
  • Straight Line Data Analysis
  • Least Squares
  • Levenberg-Marquardt and Ridge Regression Algorithms
  • Damped Least Squares Comparison
  • Stochastic Inverse
  • Singular Value Decomposition
  • Random and Grid-Search Methods
  • Simulated Annealing and Genetic Algorithms
  • Neural Networks
  • Parameter Error Estimates
  • Large Inverse Problems
  • Experimental Design

Note that the order of the lectures can vary from that given above. A bound copy (and an electronic version) of all PowerPoint lecture notes is given to each participant, to follow lectures and make notes.

Other Instructors

COVID-19 Updates

We fully expect to resume on-campus Short Programs courses during the Summer of 2022. However, the possibility remains of ongoing disruption and restrictions due to COVID-19 which may require that the course be delivered via live virtual format. Please read more here.

Links & Resources



“The course efficiently provided a broad understanding of a wide variety of methods to a very varied and interesting group of students.”
“Course was well designed. Lab work was very helpful. Application to real-world problems was well illustrated.”
"I enjoyed the courses taken at MIT this summer. They combined a large amount of theory with lab work in an accelerated fashion. These courses have been the best post-bachelor's courses I have taken thus far."
“I found it to be a very stimulating and exciting environment. I felt that the instructors were very knowledgeable in the area and were willing to discuss issues related to applications beyond the classroom. Overall, I would attend courses at MIT Professional Education - Short Programs in the future and would recommend the program to colleagues.”
“The lab portions of the class were thoughtfully planned and very instructive.”
“The instructors were excellent, and the in-lab reviews with other participants were enlightening.”
“To remain competitive, we need to implement process improvements across the organization to greatly improve efficiency while simultaneously increasing the robustness and efficacy of our products. This knowledge provides additional tools to accomplish this.”
“Definitely a good balance between the lectures in the morning which gave the theory and the labs in the afternoon which allowed time to work on the practical application.”
“The knowledge will help me to better analyze customer data and build more sophisticated simulation models.”
Download the Course Flyer
SP - Foundations of Data and Models - Thumbnail


The type of content you will learn in this course, whether it's a foundational understanding of the subject, the hottest trends and developments in the field, or suggested practical applications for industry.

Fundamentals: Core concepts, understandings, and tools - 75%|Latest Developments: Recent advances and future trends - 25%
Delivery Methods

How the course is taught, from traditional classroom lectures and riveting discussions to group projects to engaging and interactive simulations and exercises with your peers.

Lecture: Delivery of material in a lecture format - 40%|Discussion: Guided discussion reinforcing lectures and computer lab work - 15%|Labs: Demonstrations, experiments, simulations - 45%

What level of expertise and familiarity the material in this course assumes you have. The greater the amount of introductory material taught in the course, the less you will need to be familiar with when you attend.

Introductory: Appropriate for a general audience - 30%|Specialized: Assumes experience in practice area or field - 50%|Advanced: In-depth explorations at the graduate level - 20%