Reinforcement learning algorithms for continuous states, discrete actions

Tags:

I'm trying to find optimal policy in environment with continuous states (dim. = 20) and discrete actions (3 possible actions). And there is a specific moment: for optimal policy one action (call it "action 0") should be chosen much more frequently than other two (~100 times more often; this two action more risky).

I've tried Q-learning with NN value-function approximation. Results were rather bad: NN learns always choose "action 0". I think that policy gradient methods (on NN weights) may help, but don't understand how to use them on discrete actions.

Could you give some advise what to try? (maybe algorithms, papers to read). What are the state-of-the-art RL algorithms when state space is continuous and action space is discrete?

Thanks.

880

asked Nov 19 '14 03:11

centuri0n

1 Answers

Applying Q-learning in continuous (states and/or actions) spaces is not a trivial task. This is especially true when trying to combine Q-learning with a global function approximator such as a NN (I understand that you refer to the common multilayer perceptron and the backpropagation algorithm). You can read more in the Rich Sutton's page. A better (or at least more easy) solution is to use local approximators such as for example Radial Basis Function networks (there is a good explanation of why in Section 4.1 of this paper).

On the other hand, the dimensionality of your state space maybe is too high to use local approximators. Thus, my recommendation is to use other algorithms instead of Q-learning. A very competitive algorithm for continuous states and discrete actions is Fitted Q Iteration, which usually is combined with tree methods to approximate the Q-function.

Finally, a common practice when the number of actions is low, as in your case, it is to use an independent approximator for each action, i.e., instead of a unique approximator that takes as input the state-action pair and return a Q value, using three approximators, one per action, that take as input only the state. You can find an example of this in Example 3.1 of the book Reinforcement Learning and Dynamic Programming Using Function Approximators

answered Nov 26 '22 07:11

Pablo EM

Related questions
                            
                                The relationship between latent Dirichlet allocation and documents clustering
                            
                                Can TF/IDF take classes in account
                            
                                Defining a gradient with respect to a subtensor in Theano
                            
                                Numpy linear regression with regularization
                            
                                How to interpret `scipy.stats.kstest` and `ks_2samp` to evaluate `fit` of data to a distribution?
                            
                                Are modern CNN (convolutional neural network) as DetectNet rotate invariant?
                            
                                Getting 'ValueError: shapes not aligned' on SciKit Linear Regression
                            
                                Tensorflow estimator: average_loss vs loss
                            
                                Trainable sklearn StandardScaler for R
                            
                                is it possible to implement dynamic class weights in keras?
                            
                                How Transformer is Bidirectional - Machine Learning
                            
                                How to load the saved tokenizer from pretrained model
                            
                                Implementing PCA with Numpy
                            
                                What is tape-based autograd in Pytorch?
                            
                                Compiling Caffe C++ Classification Example
                            
                                Keras: How to feed input directly into other hidden layers of the neural net than the first?
                            
                                Probability prediction method of KNeighborsClassifier returns only 0 and 1
                            
                                Keras LSTM - why different results with "same" model & same weights?
                            
                                How do I use principal component analysis in supervised machine learning classification problems?
                            
                                How do I convert new data into the PCA components of my training data?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Reinforcement learning algorithms for continuous states, discrete actions

Tags:

machine-learning

reinforcement-learning

centuri0n

People also ask

1 Answers

Pablo EM

Recent Activity

Donate For Us