By Devavrat Shah
We are in the golden age of machine learning. From Microsoft's new computer vision system outperforming humans to Google's AI algorithm mastering the ancient game of Go, scientists have already achieved what many thought would take years to accomplish. There is growing excitement about what new applications deep learning will enable next. Will we soon rely on computers to keep us safe on our daily commute in self-driving cars? Will we use machines to diagnose us based on our symptoms and medical history?
There's much hype around the promise of deep learning, but the reality is there is still a lot we don't know and understand about how it works. Remarkable progress has been made, yet issues and limitations still remain. If we want to realize the potential of deep learning, we must first solve the mathematical mysteries lurking beneath the surface.
Deep Learning = Layered Learning
For starters, it helps to be clear about what deep learning really is and what it's not. If you have a smartphone, then chances are the software on it that translates your voice into words uses deep learning. The fact of the matter is, speech recognition technology has been around for years -- deep learning just made it faster.
Deep learning is also nothing new. It's just a subfield of machine learning concerned with algorithms inspired by the structure and function of the brain called artificial neural networks. Neural networks were first proposed in 1944 by Warren McCullough and Walter Pitts, two University of Chicago researchers who moved to MIT in 1952 as founding members of what's sometimes called the first cognitive science department.
To better understand the concept of deep learning, it helps to think of the term as more or less interchangeable with "layered learning." Deep learning involves creating complex, hierarchical representations from simple building blocks to solve high-level problems. The network learns something simple at the initial level in the hierarchy, then sends the information to the next level where the information is combined into something more complex. The process continues, with each level building from the input it received from the previous level.
Under the Hood
At the heart of deep learning lies an optimization problem: the deeper the layers in the neural network get, the harder the optimization algorithm becomes. There are many simpler models in machine learning, such as support vector machines and logistic regression, that have certain mathematical guarantees, but that's not the case for deep neural networks. The algorithms we know don't always work, they come out non-convex and unwieldy. We use them anyway -- and somehow, miraculously, they still produce relatively accurate results. We just don't know why.
Therein lies another problem. Deep learning is rapidly reshaping the boundary of what we think of as possible. It is enabling machine learning in a way that is both awesome and exciting. However, it's also a technology for which we don't completely understand the algorithms driving it. That is why it is so critical for professionals in the data science space to understand the basic foundation of deep learning: which types of optimization problems are easy and which ones aren't; which approaches work, or seem to work, even when they shouldn't; and other issues such aswhat makes image classification so challenging for computers when humans are so good at it?
Deep learning has been able to conquer some very tough challenges, and that's a great reason for being optimistic. However, issues exist -- many are unsettling, if not baffling, to researchers and data scientists alike. We must overcome these problems before we can realize deep learning's truly amazing applications. Ongoing education can play a leading role by teaching professionals the fundamental math and underlying theories behind these very new machine-learning techniques, which in turn will allow us to address this important challenge and bring us much closer to creating true artificial intelligence.
About the Author
Devavrat Shah, codirector of the Data Science: Data to Insights course, is a professor in MIT’s Department of Electrical Engineering and Computer Science, director of MIT’s Statistics and Data Science Center (SDSC), and a core faculty member at the MIT Institute for Data, Systems, and Society (IDSS). He is also a member of MIT’s Laboratory for Information and Decision Systems (LIDS) and the Operations Research Center (ORC). The online course is open to any data science professional wishing to learn how to apply data science techniques to more effectively address an organization’s needs.