Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why is episode done after 200 time steps (Gym environment MountainCar)?

When using the MountainCar-v0 environment from OpenAI-gym in Python the value done will be true after 200 time steps. Why is that? Because the goal state isn't reached, the episode shouldn't be done.

import gym
env = gym.make('MountainCar-v0')
env.reset()
for _ in range(300):
    env.render()
    res = env.step(env.action_space.sample())
    print(_)
    print(res[2])

I want to run the step method until the car reached the flag and then break the for loop. Is this possible? Something similar to this:

n_episodes = 10
done = False
for i in range(n_episodes):
    env.reset()
    while done == False:
        env.render()
        state, reward, done, _ = env.step(env.action_space.sample())
like image 291
needRhelp Avatar asked Mar 14 '17 13:03

needRhelp


People also ask

What is observation space in gym?

The observation_space defines the structure as well as the legitimate values for the observation of the state of the environment. The observation can be different things for different environments.

What is a gym wrapper?

Wrapper that override how the environment processes observations, rewards, and action. The following three classes provide this functionality: gym. ObservationWrapper : Used to modify the observations returned by the environment. To do this, override the observation method of the environment.


2 Answers

The current newest version of gym force-stops environment in 200 steps even if you don't use env.monitor. To avoid this, use env = gym.make("MountainCar-v0").env

like image 97
Scitator Avatar answered Oct 11 '22 10:10

Scitator


Copied from https://github.com/openai/gym/wiki/FAQ:

Environments are intended to have various levels of difficulty, in order to benchmark the ability of reinforcement learning agents to solve them. Many of the environments are beyond the current state of the art, so don't expect to solve all of them. (If you do, please apply).

If you want to experiment with a variant of an environment that behaves differently, you should give it a new name so that you won't erroneously compare your agent running on an easy variant to someone else's agent running on the original environment. For instance, the MountainCar environment is hard partly because there's a limit of 200 timesteps after which it resets to the beginning. Successful agents must solve it in less than 200 timesteps. For testing purposes, you could make a new environment MountainCarMyEasyVersion-v0 with different parameters by adapting one of the calls to register found in gym/gym/envs/__init__.py:

gym.envs.register(
    id='MountainCarMyEasyVersion-v0',
    entry_point='gym.envs.classic_control:MountainCarEnv',
    max_episode_steps=250,      # MountainCar-v0 uses 200
    reward_threshold=-110.0,
)
env = gym.make('MountainCarMyEasyVersion-v0')

Because these environment names are only known to your code, you won't be able to upload it to the scoreboard.

like image 26
catherio Avatar answered Oct 11 '22 11:10

catherio