Designing fast, efficient deep-learning systems

Vivienne Sze

The past few years have seen a broad range of deep-learning applications—from self-driving cars to healthcare—with a focus on new algorithms. But equally important are the hardware platforms on which the algorithms run, according to Vivienne Sze, an associate professor at MIT in the Electrical Engineering and Computer Science Department. “Accuracy is not enough—energy efficiency and speed are important as well,” she said in a recent phone interview. A navigation system, for example, must process data in real time.

The design of efficient hardware systems to support deep learning is the focus of an MIT Professional Education course titled Designing Efficient Deep Learning Systems that Sze will teach March 28-29 at the Samsung Research America campus in Mountain View, CA. The course will be repeated this summer on MIT’s campus at a date to be determined. The course will cover hardware platforms such as CPUs and GPUs, how algorithms run on them, and optimization techniques. But the course will emphasize custom hardware.

There are limitations to deploying machine learning to a self-driving car or smartphone. In addition to having an accurate algorithm, Sze said, the system you build must be high-speed and energy-efficient. Navigation has to process in real time. “It’s not just accuracy, where the focus has been,” she said. “We are trying to address speed and energy by looking across whole stack from algorithm to hardware. The focus is on specialized hardware, its limitations, and the computation that advanced hardware has enabled.”

The course will emphasize the importance of an interplay between hardware and algorithm designers. The goal is to teach people how to think about the problem with an overall architectural approach, she said.

Sze’s research interests include energy-aware signal-processing algorithms and low-power circuit and system design for deep learning, computer vision, autonomous navigation, and image/video processing. Work during 2015 led to the publication of a paper she coauthored titled “Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks.” At the International Solid State Circuits Conference in February 2016, MIT researchers described the chip as achieving 10 times the efficiency of a mobile GPU, as reported in MIT News.

There is a lot of research in the edge, Sze said during our phone interview. “We looked at both the algorithms and the hardware side in developing the Eyeriss chip,” she added. “The goal was to address the needs of a smartphone, wearable, or other embedded device.”

Power consumption can be addressed at both the algorithm and platform level. “You can do a lot from a hardware perspective, but some aspects are limited to the algorithm itself,” Sze said. She cited the High Efficiency Video Coding (HEVC) standard as an example, noting that she was involved in the algorithm changes to improve efficiency. HEVC supersedes a 2004 algorithm, she said, noting that we consume video very differently now, with streaming to mobile devices. She shared an engineering Emmy award last fall for her work on HEVC.

Of course, an alternative to computing at the edge is to send data to the cloud for processing. But lots of apps do not want to share information with cloud, Sze said. Latency might be longer than can be tolerated, or you may simply not have a good connection. And for medical applications, for example, security and privacy are concerns.

Consequently, extensive work is continuing on developing the fast and efficient deep-learning hardware platforms that will be the subject of Sze’s course. Sze cited a recent article in The New York Times noting that at least 45 companies are working on chips for deep-learning applications—at least five of which have each raised more than $100 million from investors.

Sze said the upcoming course will provide a broad perspective of the deep-learning landscape with a focus on speed and power and the interplay of algorithms and hardware. Algorithm developers in attendance can learn how platforms vary and how to adapt their code to run efficiently. Hardware developers can learn what type of neural networks are out there and how they can support them. And finally, investors can gain insights about what questions to ask and what metrics to apply when evaluating startups seeking funding.

What distinguishes this course is the focus not just on algorithms but on building systems and making them efficient, she concluded.

Sze’s course is part of a portfolio of courses that are part of the new Professional Certificate Program in Machine Learning and Artificial Intelligence. The full portfolio of courses hasn’t been announced yet (courses are still being rolled out and finalized) but, in addition to Sze’s course, Designing Efficient Deep Learning Systems, initial offerings include Modeling and Optimization for Machine Learning and ApplicationsMachine Learning for Big Data and Text Processing, and Machine Learning for Healthcare.

For more information on and to register for Designing Efficient Deep Learning Systems, visit MIT Professional Education.

Sze is also co-instructor (along with Joel Emer, an MIT professor and senior distinguished research scientist at Nvidia) for a course for MIT undergraduate and graduate students titled, Hardware Architecture for Deep Learning, which you can read more about here.

Source: Evaluation Engineering