Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reinforcement Learning vs Operations Research

I was wondering when one would decide to resort to Reinforcement Learning to problems that have been previously tackled by mathematical optimisation methods - think the Traveling Salesman Problem or Job Scheduling or Taxi Sharing Problems.

Since Reinforcement Learning aims at minimising/maximising a certain cost/reward function in a similar way as Operational Research attempts at optimising the result of a certain cost function, I would assume that problems that could be solved by one of the two parties may be tackled by the other. However, is this the case? Are there tradeoffs between the two? I haven't really seen too much research done on RL regarding the problems stated above but I may be mistaken.

If anyone has any insights at all, they would be highly appreciated!!

like image 409
Antonia Calvi Avatar asked Aug 10 '18 13:08

Antonia Calvi


People also ask

What is the difference between operations research and machine learning?

The point to be noted here is that the machine learning models are related and concerned with the one task prediction whereas the operation research is concerned with the large collection of unique methods for specific classes of problems.

What is the difference between reinforcement learning and machine learning?

Reinforcement learning is the training of machine learning models to make a sequence of decisions. The agent learns to achieve a goal in an uncertain, potentially complex environment. In reinforcement learning, an artificial intelligence faces a game-like situation.

IS Operations Research part of artificial intelligence?

Operations Research may be referred to as part of Artificial Intelligence (at least to the extent that both make use of data to provide support to decision making processes), which naturally includes Machine (& Deep) Learning (ML).

Can reinforcement learning be used for optimization?

When applied to optimization problems, reinforcement learning can be seen as a learning, heuristic search strategy. After training on a set of problems, a reinforcement learning policy can efficiently generate solutions for similar, unseen problems.


2 Answers

Here is my two cents. I think that although both approximations have a common goal (optimal decision making), their fundamental working principles are different. In essence, Reinforcement Learning is a data driven approach, where the optimization process is achieved by agent-environment interaction (i.e., data). On the other hand, Optimisation Research uses other methods that require deeper knowledge of the problem and/or imposes more assumptions.

There are many problems, especially academic or toy problems, where both approximations, RL and OR, can be applied. In real world applications, I guess that if you can meet all the assumptions required by OR, RL wouldn't achieve better results. Unfortunately, this is no always the case, so RL is more useful in such cases.

Notice, however, that there exist methods in which is not clear the difference between RL and OR.

like image 103
Pablo EM Avatar answered Oct 02 '22 22:10

Pablo EM


Pablo provided a great explanation. My research is actually in reinforcement learning vs model predictive control. And MPC is a control approach based on trajectory optimization. Reinforcement learning is just a data driven optimization algorithm and can be used for your above examples. Here is a paper for the traveling salesman problem using RL.

The biggest differences are really these:

Reinforcement Learning Method

  • Does not need a model, but a "playground" to try different actions in the environment and learn from it (ie. data driven approach)
  • does NOT guarantee optimality in complex problems due to nonlinear mapping of states to actions. In multiple input multiple output problems, RL uses nonlinear function approximators to solve tasks. But there is no guaranteed convergence the moment these are used
  • Great for problems where it is hard or impossible to derive a model for.
  • Extremely difficult to train, but cheap online calculation
  • Inherent adaptive nature. If the conditions of an environment change, RL can usually adapt by learning the new environment.
  • Worst of all, decisions made by RL is uninterpretable. Advanced RL algorithms are made up of multiple neural networks, therefore, if our RL car driver drives off a cliff, it is nearly impossible to identify just why it would do such a thing.

Optimzation Approaches

  • Performance is dependent on the model. If the model is bad, the optimization will be terrible.

  • Because performance is based on model, identifying a "perfect" model is extremely expensive. In the energy industry, such a model for one plant costs millions, especially because the operating conditions change over time.

  • GUARANTEES optimality. There are many papers published that goes into the proofs regarding that these approaches guarantee robustness, feasibility, and stability.

  • Easy to interpret. Controls and decisions using a optimization approach is easy to interpret because you can go into the model and calculate for why a certain action was performed. In the RL case, this is usually a neural network and completely a black box. Therefore, for safety sensitive problems, RL is currently RARELY used.

  • Very expensive online calculation depending on prediction horizon, because at each time step, we have to optimize the trajectory given the current states.

like image 26
Rui Nian Avatar answered Oct 02 '22 23:10

Rui Nian