Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Updating an old system to Q-learning with Neural Networks

Recently I've been reading a lot about Q-learning with Neural Networks and thought about to update an existing old optimization system in a power plant boiler composed of a simple feed-forward neural network approximating an output from many sensory inputs. The output then is linked to a linear model-based controller that somehow output again an optimal action so the whole model can converge to a desired goal.

Identifying linear models is a consuming task. I thought about refurbishing the whole thing to model- free Q-learning with a Neural Network approximation of the Q-function. I drew a diagram to ask you if I'm on the right track or not.

model

My question: if you think I understood well the concept, should my training set be composed of State Features vectors from one side and Q_target - Q_current (here I'm assuming there's an increasing reward) in order to force the whole model towards the target or am I missing something?

Note: The diagram shows a comparison between the old system in the upper part and my proposed change on the lower part.

EDIT: Does a State Neural Network guarantee Experience Replay?

like image 459
Leb_Broth Avatar asked Oct 20 '16 15:10

Leb_Broth


People also ask

How does a neural network make the Q-learning algorithm more efficient?

These networks have the same architecture but different weights. Every N steps, the weights from the main network are copied to the target network . Using both of these networks leads to more stability in the learning process and helps the algorithm to learn more effectively.

How are neural networks used in deep Q-learning?

In deep Q learning, we utilize a neural network to approximate the Q value function. The network receives the state as an input (whether is the frame of the current state or a single value) and outputs the Q values for all possible actions. The biggest output is our next action.

Can neural networks be used for reinforcement learning?

In the case of deep reinforcement learning, a neural network is in charge of storing the experiences and thus improves the way the task is performed.


1 Answers

You might just use all the Q value of all the actions in the current state as the output layer in your network. A poorly drawn diagram is here

You can therefore take advatange of NN's ability to output multiple Q value at a time. Then, just back prop using loss derived by Q(s, a) <- Q(s, a) + alpha * (reward + discount * max(Q(s', a')) - Q(s, a), where max(Q(s', a')) can be easily computed from the output layer.

Please let me know if you have further questions.

like image 105
xtt Avatar answered Nov 15 '22 01:11

xtt