Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficient reward range in deep reinforcement learning

When selecting reward value in DQN, Actor-Critic or A3C, is there any common rules to select reward value??

As I heard briefly, (-1 ~ +1) reward is quite efficient selection.

Can you tell me any suggestion and the reason ??

like image 451
WKIm Avatar asked Dec 17 '25 13:12

WKIm


1 Answers

Ideally, you want to normalize your rewards (i.e., 0 mean and unit variance). In your example, the reward is between -1 to 1, which satisfies this condition. I believe the reason was because it speeds up gradient descent when updating your parameters for your neural network and also it allows your RL agent to distinguish good and bad actions more effectively.

An example: Imagine we are trying to build an agent to cross the street, and if it crosses the street, it gains a reward of 1. If it gets hit by a car, it gets a reward of -1, and each step yields a reward of 0. Percentage-wise, the reward for success is massively above the reward for failure (getting hit by a car).

However, if we give the agent a reward of 1,000,000,001 for successfully crossing the road, and giving it a reward of 999,999,999 for getting hit by a car (this scenario and the above are identical when normalized), the success is no longer as pronounced as previously. Also, if you discount such high rewards, it will make the distinction of the two scenarios even harder to identify.

This is especially a problem in DQN and other function approximation methods because these methods generalize the state, action, and reward spaces. So a reward of -1 and 1 are massively different, however, a reward of 1,000,000,001 and 999,999,999 are basically identical if we were to use a function to generalize it.

like image 137
Rui Nian Avatar answered Dec 21 '25 09:12

Rui Nian



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!