Imagine a binary classification problem like sentiment analysis. Since we have the labels, cant we use the gap between actual - predicted as reward for RL ?
I wish to try Reinforcement Learning for Classification Problems
Interesting thought! According to my knowledge it can be done.
Imitation Learning - On a high level it is observing sample trajectories performed by the agent in the environment and use it to predict the policy given a particular stat configuration. I prefer Probabilistic Graphical Models for the prediction since I have more interpretability in the model. I have implemented a similar algorithm from the research paper: http://homes.soic.indiana.edu/natarasr/Papers/ijcai11_imitation_learning.pdf
Inverse Reinforcement Learning - Again a similar method developed by Andrew Ng from Stanford to find the reward function from sample trajectories, and the reward function can be used to frame the desirable actions. http://ai.stanford.edu/~ang/papers/icml00-irl.pdf
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With