Is Reinforcement Learning Right for Your AI Problem?

Jul 9, 2021

Pulkit Agrawal and Cathy Wu

In the world of machine learning, reinforcement learning is an important sub-category of deep learning. In deep learning the human brain is mimicked through a hierarchical structure of human-made, artificial neural networks.

Reinforcement learning (RL) is a basic machine learning paradigm that does not require the raw data to be labeled, as is required typically with machine learning. Reinforcement learning helps determine if an algorithm is producing a correct right answer or a reward indicating it was a good decision. RL is based on interactions between an AI system and its environment. An algorithm receives a numerical score based on its outcome and then the positive behaviors are “reinforced” to refine the algorithm over time. In recent years, RL has been behind super-human performance on GO, Atari games and many other applications.

Imagine training a machine learning agent to trade stocks. One option is to provide the system with many examples of good strategies – i.e., labeled information about whether to sell a particular stock at a particular time or not. This is the well-known supervised learning paradigm. Because the agent is trying to mimic good strategies, it cannot outperform them. How can we find strategies that outperform the expert? The answer is RL.

But while RL is a powerful approach to AI, it is not a fit for every problem, and there are multiple types of RL.

Ask yourself these six questions to decide which might help you with what you are trying to solve:

Does My Algorithm Need to Make a Sequence of Decisions?

RL is a perfect fit for problems that require sequential decision-making – that is, a series of decisions that all affect one another. If you are developing an AI program to win at a game, it is not enough for the algorithm to make one good decision; it must make a whole sequence of good decisions. By providing a single reward for a positive outcome, RL weeds out solutions with that result in low rewards and elevates those that enable an entire sequence of good decisions.

Do I Have an Existing Model?

If you want to write a program for a robot to pick up a physical object, then you can use the laws of physics to inform your model. But if you are trying to write a program to maximize returns in the stock market, there is no existing model that can be used. Instead, you will need to use heuristics that have been manually tuned over time. But these heuristics might be suboptimal. Typically, RL is a good fit when there is no existing model to rely on or you want to improve over an existing decision-making strategy.

How Much Data Do I Have? What is at Stake if a Wrong Decision is Made?

The amount of data you already have and the cost of making bad decisions may help you to determine whether to use online or offline RL.

For instance, imagine you are running a video platform, and you need to train an algorithm to offer recommendations to users. If you have no data, then you have no option but to interact with the user and make recommendation decisions in real-time, using an online process. Such exploration comes at a cost – a few bad recommendations made while the system is learning can disappoint the user. However, if you already have large amounts of data, you can develop a good policy without interacting with specific users. This is offline RL training.

Does My Goal Change?

Sometimes in AI, your target never changes. With stocks, you are always going to want to maximize your returns. Such a problem is not goal-conditioned, because you are always solving for the same goal. But in other cases, your goal might be a moving target. Consider Loon, Google’s recently shuttered effort to build giant balloons to beam the internet to rural areas. Here, the optimal position for each balloon is different. For such instances, goal conditioned RL is a better fit.

Read the rest of the article on Enterprise AI.