This course is also offered in an on-campus format, meeting simultaneously with the live online cohort.

Register Now
Lead Instructor(s)
Date(s)
Jul 27 - 30, 2026
Registration Deadline
Location
Live Online
Course Length
4 Days
Course Fee
$3,600
CEUs
2.2
Sign-up for Course Updates Watch Course Webinar

AI for Science has emerged as one of the most dynamic frontiers in modern engineering, reshaping domains from healthcare to infrastructure. At the center of this transformation is materials science—the discipline that sets the physical limits of our world.

As we approach advanced level capabilities in scientific reasoning approaching Artificial General Intelligence (AGI) - systems that can integrate knowledge, propose hypotheses, and iteratively refine designs with real-world agency - materials discovery is shifting from intuition-driven search to AI-augmented invention. In this course, you will learn how to apply foundation models, generative methods, and agentic workflows alongside multiscale modeling to compress development timelines and enable high-speed, cost-effective prototyping at the edge of physical feasibility.

This course may be taken individually or as part of the Professional Certificate Program in Design & Manufacturing or the Professional Certificate Program in Machine Learning & Artificial Intelligence.

Course Overview


“In this course, you won't just watch AI in action, you'll collaborate with it - building agents that reason, design, and solve problems alongside you, for you, and teaching you.” – Professor Markus J. Buehler

Traditional materials informatics emphasized data, models, and prediction. We are now entering the era of Agentic AI - where systems can plan, execute, and refine scientific tasks autonomously, with verification through physics-based constraints and validation workflows. This has resulted in unprecedented advances in materials discovery in many verticals, such as new alloys, new composites, novel proteins as biomaterials, and deep integration of intelligence into smart material systems.

This condensed four-day program invites you to the forefront of this shift from prediction to closed-loop discovery. Led by MIT Professor Markus J. Buehler, you will move beyond simple predictive models to build AI agents capable of scientific reasoning, design and manufacturing.

What sets this program apart is its focus on demystifying AI and making the advanced capabilities relevant to your use case. Rather than merely applying off-the-shelf models, you will develop the critical skills to understand how foundation models "think" - i.e., represent and reason over - physics, chemistry and biology.  

You will learn to:

  • Orchestrate agentic workflows: Move from static code to dynamic agents that can read literature, formulate hypotheses, write and execute code, run simulations and (where applicable) interface with experimental automation pipelines, and suggest experimental validations.
  • Harness multimodal foundation models: Integrate text, images, graphs, and voxel data into a unified reasoning framework that understands the full context of a material, building unified representations that support retrieval, prediction, and design.
  • Bridge the reality gap: Connect generative designs directly to physical constraints using multiscale modeling, ensuring your AI creations are manufacturable and viable, through explicit physical constraints and verification loops (simulation and, where relevant, experiments), supported by world model building.

The curriculum is grounded in MIT’s motto mens et manus (mind and hand). You will gain hands-on experience with the latest architectures and foundation models, fine-tuning techniques, reinforcement learning (RL) to develop your own custom reasoning models, and physics-informed neural networks - including Graph Transformers, Diffusion Models, neural operators and PINNs, and Large Reasoning Models (LRMs) - applying them to real-world challenges in alloys, biomaterials, and sustainable polymers.

New for 2026: This year’s curriculum introduces massively parallelized AI agents and swarm intelligence that solve complex problems. Other areas of focus will be on knowledge creation and the new AI scientist category that is entering many industries. You will learn to deploy agents that can ingest vast amounts of unstructured data - from handwritten lab notebooks to legacy PDFs - and structure it into actionable insights automatically, unlocking decades of dormant value in your organization.

The live virtual format of this program means you can participate from anywhere in the world, engaging in real-time coding exercises and receiving dozens of code examples and data sets that you can immediately apply to your projects. This practical, hands-on approach will empower you to confidently navigate and utilize AI across your organization, whether you're building models or applying them to solve complex material design challenges.

"By learning the fundamentals of AI, you can make informed decisions about which models and agents to use, and how to apply them strategically, avoiding the common pitfall of rushing into technology without understanding it."  – Professor Markus J. Buehler

NOTE: This program is designed to complement the hybrid course Generative Multiscale Materials Design: Physics, AI, Manufacturing, but can also be completed independently.

Certificate of Completion from MIT Professional Education

Machine Learning for Materials Informatics
Learning Outcomes

By the end of this program, you will be able to:

  • Design enterprise-grade agentic discovery workflows that plan, execute, and refine materials R&D tasks (literature to hypothesis to candidate generation to verification).
  • Build and manage AI agents that search and synthesize scientific knowledge, write and run code, and propose testable validation plans.
  • Apply multimodal foundation models across text, images, graphs, spectra, and 3D/voxel representations to reason over materials context end-to-end.
  • Perform inverse design by translating target functions and constraints into candidate structures for alloys, polymers, biomaterials, and architected/metamaterials.
  • Use state-of-the-art generative models (e.g., diffusion/flow-based methods and graph generative approaches) to propose novel candidates under explicit constraints.
  • Bridge the reality gap with physics grounding by coupling learning to multiscale modeling (atomistic → continuum), constraints, and verification loops.
  • Integrate modern surrogate physics models (e.g., neural interatomic potentials, neural operators/PINNs where appropriate) to accelerate simulation-informed discovery.
  • Convert unstructured organizational knowledge into structured assets using vision-language and document understanding models (lab notebooks, PDFs, reports, micrographs, charts).
  • Evaluate model reliability using uncertainty, robustness tests, out-of-distribution checks, and falsification-style adversarial evaluation.
  • Implement interpretability and governance practices (“glass-box” diagnostics, traceability, audit trails) suitable for high-stakes engineering decisions.
  • Deploy a reusable, cloud-runnable prototype—an agent framework or model pipeline you can adapt to your organization’s materials discovery workflow.

What You Will Take Away

Beyond the lectures and live clinics, you will leave this course with a comprehensive toolkit of assets that you can immediately deploy in your organization:

  • Ready-to-use agent templates: A library of pre-configured "AI Scientist" agents capable of literature search, hypothesis generation, and code execution, built on modern frameworks.
  • Customizable code notebooks: Dozens of Jupyter notebooks covering the entire pipeline (from data curation and fine-tuning LRMs to training diffusion models and running physics verifications).
  • Datasets: Access to curated, clean datasets for alloys, proteins, and composites, specifically designed to benchmark your models during and after the course.
  • Detailed, highly curated lecture materials (lessons, slides, literature, papers).

“An AI that only predicts is an oracle. An AI Scientist closes the loop - reading, hypothesizing, designing, and verifying against physics, imaging and building new worlds."  – Professor Markus J. Buehler

PROGRAM OUTLINE

NOTE: All times Eastern Daylight Time (UTC-4:00). A few introductory lecture videos and reading materials will be posted ahead of the course.

Day One

9am-noon: From Machine Learning to AGI: The evolution of the "AI Scientist”. Agent architectures: Planning, memory, tools, and action. Automating knowledge extraction and organization from unstructured sources (PDFs, lab notes) using LRMs.

1-2pm: Clinic #1: Physics-aware vision models (from convolutional models to attention, multimodal AI). Example application to a data agent.

2-4pm: Digging deeper: Deep neural nets, loss functions, stochastic optimization methods (e.g., stochastic gradient descent), regularization. Pre-and post-training (supervised fine-tuning, RLHF, DPO, GRPO, and variants). Introduction to Transformers and graph neural networks.

4-5pm: Clinic #2: Building your first reasoning agent for material failure analysis. Hands-on lab: constructing a basic agent that can search literature, formulate a hypothesis, and write code to test it. How does an agent remember a paper it read two days ago while designing an experiment today? Graph reasoning, RAG (Retrieval-Augmented Generation) architectures for scientific data, and related methods.

5-7pm: Interactive virtual networking reception (get to know peers, the instructor, and make connections)

Day Two

9-10am: Hands-on introduction to PyTorch (example application to fine-tuning a LLM for  proteins). Vibe coding and vibe research (tools, best strategy, etc.).

10-11am: Generative AI for materials: Diffusion & flow matching. Theory and application of diffusion models for inverse design. Generating 3D structures: Proteins, metamaterials, and crystal lattices.  

11am-noon: Practical guide to tensor algebra and other important math concepts needed. Equivariance and related concepts for machine learning potentials.

1-2pm: Responsible AI. Ensuring these systems are safe, reliable, and deployable in an enterprise environment. Data science, statistics and visualization (includes review of relevant Python toolkits). Collaborative research & open science practices

2-3:30pm:  De novo dataset construction and application to build a agentic workflows (covers computer vision tools, autonomous experimentation, robotics).  Neural Interatomic Potentials and Neural Operators.

3:30-5pm: Introduction to graph neural networks (applications to molecular systems, truss systems, alloys, proteins, and healthcare; graph transformers). Connecting a generative agent to a physics simulator (e.g., LAMMPS or FEA) to automatically validate AI designs. Model Context Protocol (MCP), tool use, and related technologies.

Day Three

9-10:30am: Transforming AI for biology and healthcare (AlphaFold , Boltz, RFDiffusion, etc., and applications to protein design, synthesis). Reasoning with foundation models. Fine-tuning Large Language Models (LLMs) for scientific tasks. Geometric deep learning.

10:30am-noon: Building, using and adapting LLMs and LRMs applied to materials (pre-training and fine-tuning, RL, PRefLexOR, reflective agents). Diffusion language models (DLMs) (applications of large language models to materials problems; category theory; time-dependent material phenomena).

1-2pm: Clinic #3: Self-improving and adaptive AI models for inverse materials design and integration into agentic frameworks (AG2, LangChain, CrewAI, MCPs, Skills, …). Using generative learning for protein design and synthesis planning. Translating biological mechanisms into synthetic material concepts. Active learning with negative data.

2-3pm: AI scientists and swarm intelligence. Agent development, orchestration, deployment, decentralized intelligence. Application to creative design tasks, protein design. Case study: Spawning multiple agent instances to explore a design space in parallel, critique each other’s work, and converge on an optimal solution.

4-5pm: Case Study: AI in Manufacturing & Scale-up. Using AI to optimize processing parameters (3D printing, casting) based on limited data, real-time time series data collection and integration with LRMs.

Day Four

9-10:30am: Interpretability, scaling and deployment (inference engines, sandboxing, distributed agents, etc.). Model Distillation, Quantization, and SLMs (Small Language Models). AI-assisted software engineering for science. Human-in-the-Loop Interfaces: How does a human scientist interact with a swarm? The UI/UX of agentic science and ecosystems.

10:30--noon: Clinic #4: Participant group project presentations, feedback and discussion.

noon-1pm: Concluding discussion; graduation ceremony and certificates

Post-course: Participants will have the opportunity to ask additional questions at two office hour sessions, which will be offered one and two weeks after the conclusion of the course.

Links and Resources

Video/Audio:

News/Articles:

Who Should Attend

This course is designed for technical professionals and strategic leaders who are ready to transition their organizations from predictive modeling to autonomous discovery. It is ideal for those looking to build, deploy, or manage "AI Scientist" agents and closed-loop research systems.

Professionals who would particularly benefit include:

  • Computational Scientists & ML Engineers
    Why: You need to move beyond standard model training to building agentic workflows and reasoning loops. You will learn to architect systems that orchestrate code, simulation, and data extraction autonomously.
  • R&D Directors & Innovation Leads, VP for Innovation and CTOs
    Why: You are responsible for the future of your research organization and setting the trajectory of innovation and discovery. You need to understand how swarm intelligence and automated knowledge extraction (from legacy PDFs and reports), autonomous simulation and experimentation, can unlock dormant value and accelerate discovery timelines by orders of magnitude.
  • Materials Scientists & Chemists (Experimentalists)
    Why: You want to encode and scale human expertise into AI agents and build autonomous lab facilities. This course teaches you to use Large Reasoning Models (LRMs) and agents to drive and capture your scientific logic, allowing you to scale your problem-solving abilities across thousands of candidate materials.
  • Deep Tech Entrepreneurs & Founders
    Why: You are building the next generation of AI-native companies. You need to master generative design (Diffusion/Flow Matching) and physics-informed verification to prove to investors that your AI-discovered materials are manufacturable and real.
  • Data Strategy & Digital Transformation Leaders
    Why: Your organization is drowning in unstructured data. You will learn to deploy vision-language agents that can read, digitize, and structure decades of handwritten notes, micrographs, and technical reports into actionable knowledge graphs.
  • Sustainability & Circular Economy Directors
    Why: You need radical solutions that change the paradigm quickly. You will apply inverse design and multiscale modeling to discover novel biodegradable polymers, green alloys, or carbon-capture materials that meet strict environmental constraints.
  • IP Professionals & Technology Scouts
    Why: The invention process is changing. You need to understand the capabilities (and hallucinations) of AGI-class/frontier systems to accurately assess the novelty, feasibility, and patentability of AI-generated matter.

REQUIREMENTS

A computer with a reliable internet connection is required. All heavy computation (including Agent Swarms and Diffusion training) will be performed on a provided cloud platform (e.g., Google Colab Pro / AWS). No local GPU is required.