Reinforcement Learning vs Operations Research

Tags:

I was wondering when one would decide to resort to Reinforcement Learning to problems that have been previously tackled by mathematical optimisation methods - think the Traveling Salesman Problem or Job Scheduling or Taxi Sharing Problems.

Since Reinforcement Learning aims at minimising/maximising a certain cost/reward function in a similar way as Operational Research attempts at optimising the result of a certain cost function, I would assume that problems that could be solved by one of the two parties may be tackled by the other. However, is this the case? Are there tradeoffs between the two? I haven't really seen too much research done on RL regarding the problems stated above but I may be mistaken.

If anyone has any insights at all, they would be highly appreciated!!

409

asked Aug 10 '18 13:08

Antonia Calvi

2 Answers

Here is my two cents. I think that although both approximations have a common goal (optimal decision making), their fundamental working principles are different. In essence, Reinforcement Learning is a data driven approach, where the optimization process is achieved by agent-environment interaction (i.e., data). On the other hand, Optimisation Research uses other methods that require deeper knowledge of the problem and/or imposes more assumptions.

There are many problems, especially academic or toy problems, where both approximations, RL and OR, can be applied. In real world applications, I guess that if you can meet all the assumptions required by OR, RL wouldn't achieve better results. Unfortunately, this is no always the case, so RL is more useful in such cases.

Notice, however, that there exist methods in which is not clear the difference between RL and OR.

103

answered Oct 02 '22 22:10

Pablo EM

Pablo provided a great explanation. My research is actually in reinforcement learning vs model predictive control. And MPC is a control approach based on trajectory optimization. Reinforcement learning is just a data driven optimization algorithm and can be used for your above examples. Here is a paper for the traveling salesman problem using RL.

The biggest differences are really these:

Reinforcement Learning Method

Does not need a model, but a "playground" to try different actions in the environment and learn from it (ie. data driven approach)
does NOT guarantee optimality in complex problems due to nonlinear mapping of states to actions. In multiple input multiple output problems, RL uses nonlinear function approximators to solve tasks. But there is no guaranteed convergence the moment these are used
Great for problems where it is hard or impossible to derive a model for.
Extremely difficult to train, but cheap online calculation
Inherent adaptive nature. If the conditions of an environment change, RL can usually adapt by learning the new environment.
Worst of all, decisions made by RL is uninterpretable. Advanced RL algorithms are made up of multiple neural networks, therefore, if our RL car driver drives off a cliff, it is nearly impossible to identify just why it would do such a thing.

Optimzation Approaches

Performance is dependent on the model. If the model is bad, the optimization will be terrible.
Because performance is based on model, identifying a "perfect" model is extremely expensive. In the energy industry, such a model for one plant costs millions, especially because the operating conditions change over time.
GUARANTEES optimality. There are many papers published that goes into the proofs regarding that these approaches guarantee robustness, feasibility, and stability.
Easy to interpret. Controls and decisions using a optimization approach is easy to interpret because you can go into the model and calculate for why a certain action was performed. In the RL case, this is usually a neural network and completely a black box. Therefore, for safety sensitive problems, RL is currently RARELY used.
Very expensive online calculation depending on prediction horizon, because at each time step, we have to optimize the trajectory given the current states.

answered Oct 02 '22 23:10

Rui Nian

Related questions
                            
                                $(this) OR event.target OR var input = $(this)
                            
                                Option Recompile makes query fast - good or bad?
                            
                                Would this optimization in the implementation of std::string be allowed?
                            
                                R: avoiding summary.plm
                            
                                How to optimize an SQLite3 query
                            
                                a faster way to achieve what intersect() is giving me?
                            
                                Detecting FPU presence on Android
                            
                                fast way to read from StringIO until some byte is encountered
                            
                                How to run imageoptim from php on linux?
                            
                                Optimization of C program with SLAB-like technologies
                            
                                Double or float - optimization routines
                            
                                Optimization for recursive function required
                            
                                SPOJ 370 - Ones and zeros (ONEZERO)
                            
                                Optimizing away function calls
                            
                                optim function argument missing
                            
                                Inline function pointer to avoid if statement
                            
                                Is C++ compiler allowed to optimize out unreferenced local objects
                            
                                Loop in Python: Do stuff before first iteration
                            
                                Is checking the value of SESSION variables classed as business logic?
                            
                                Efficient way to sample a large array many times with NumPy?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Reinforcement Learning vs Operations Research

Tags:

optimization

mathematical-optimization

reinforcement-learning

operations-research

Antonia Calvi

People also ask

2 Answers

Pablo EM

Rui Nian

Recent Activity

Donate For Us