Efficient reward range in deep reinforcement learning

Question

When selecting reward value in DQN, Actor-Critic or A3C, is there any common rules to select reward value??

As I heard briefly, (-1 ~ +1) reward is quite efficient selection.

Can you tell me any suggestion and the reason ??

Rui Nian · Accepted Answer

Ideally, you want to normalize your rewards (i.e., 0 mean and unit variance). In your example, the reward is between -1 to 1, which satisfies this condition. I believe the reason was because it speeds up gradient descent when updating your parameters for your neural network and also it allows your RL agent to distinguish good and bad actions more effectively.

An example: Imagine we are trying to build an agent to cross the street, and if it crosses the street, it gains a reward of 1. If it gets hit by a car, it gets a reward of -1, and each step yields a reward of 0. Percentage-wise, the reward for success is massively above the reward for failure (getting hit by a car).

However, if we give the agent a reward of 1,000,000,001 for successfully crossing the road, and giving it a reward of 999,999,999 for getting hit by a car (this scenario and the above are identical when normalized), the success is no longer as pronounced as previously. Also, if you discount such high rewards, it will make the distinction of the two scenarios even harder to identify.

This is especially a problem in DQN and other function approximation methods because these methods generalize the state, action, and reward spaces. So a reward of -1 and 1 are massively different, however, a reward of 1,000,000,001 and 999,999,999 are basically identical if we were to use a function to generalize it.

Efficient reward range in deep reinforcement learning

Tags:

reinforcement-learning

WKIm

1 Answers

Rui Nian

Recent Activity

Donate For Us

Efficient reward range in deep reinforcement learning

Tags:

reinforcement-learning

WKIm

1 Answers

Rui Nian

Related questions

Recent Activity

Donate For Us