Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Observations meaning - OpenAI Gym

I want to know the specification of the observation of CartPole-v0 in OpenAI Gym(https://gym.openai.com/).

For example, in the following code outputs observation. One observation is like [-0.061586 -0.75893141 0.05793238 1.15547541] I want to know what the numbers mean. And I want any way to know the specification of other Environments such as MountainCar-v0, MsPacman-v0 and so on.

I tried to read https://github.com/openai/gym, but I don't know that. Would you tell me the way to know the specifications?

import gym
env = gym.make('CartPole-v0')
for i_episode in range(20):
    observation = env.reset()
    for t in range(100):
        env.render()
        print(observation)
        action = env.action_space.sample()
        observation, reward, done, info = env.step(action)
        if done:
            print("Episode finished after {} timesteps".format(t+1))
            break

(from https://gym.openai.com/docs)

The output is the following

[-0.061586   -0.75893141  0.05793238  1.15547541]
[-0.07676463 -0.95475889  0.08104189  1.46574644]
[-0.0958598  -1.15077434  0.11035682  1.78260485]
[-0.11887529 -0.95705275  0.14600892  1.5261692 ]
[-0.13801635 -0.7639636   0.1765323   1.28239155]
[-0.15329562 -0.57147373  0.20218013  1.04977545]
Episode finished after 14 timesteps
[-0.02786724  0.00361763 -0.03938967 -0.01611184]
[-0.02779488 -0.19091794 -0.03971191  0.26388759]
[-0.03161324  0.00474768 -0.03443415 -0.04105167]
like image 858
ryo Avatar asked Sep 06 '16 05:09

ryo


People also ask

What is observation space in OpenAI gym?

Our observation space is a continuous space of dimensions (210, 160, 3) corresponding to an RGB pixel observation of the same size. Our action space contains 4 discrete actions (Left, Right, Do Nothing, Fire)

What does ENV reset () return?

obs = env.reset() Called at the start of each episode, this puts the environment into its starting state and returns the initial observation of the environment.

What are gym wrappers?

Wrapper that override how the environment processes observations, rewards, and action. The following three classes provide this functionality: gym. ObservationWrapper : Used to modify the observations returned by the environment. To do this, override the observation method of the environment.

What is gym reinforcement?

Gym is an open source Python library for developing and comparing reinforcement learning algorithms by providing a standard API to communicate between learning algorithms and environments, as well as a standard set of environments compliant with that API.


2 Answers

The observation space used in OpenAI Gym is not exactly the same with the original paper. Look at OpenAI's wiki to find the answer. The observation space is a 4-D space, and each dimension is as follows:

Num Observation Min Max 0 Cart Position -2.4 2.4 1 Cart Velocity -Inf Inf 2 Pole Angle ~ -41.8° ~ 41.8° 3 Pole Velocity At Tip -Inf Inf

like image 164
RoastDuck Avatar answered Oct 18 '22 03:10

RoastDuck


After the paragraph describing each environment in OpenAI Gym website, you always have a reference that explains in detail the environment, for example, in the case of CartPole-v0 you can find all details in:

[Barto83] AG Barto, RS Sutton and CW Anderson, "Neuronlike Adaptive Elements That Can Solve Difficult Learning Control Problem", IEEE Transactions on Systems, Man, and Cybernetics, 1983.

In that paper you can read that the cart-pole has four state variables:

  1. position of the cart on the track
  2. angle of the pole with the vertical
  3. cart velocity
  4. rate of change of the angle

So, the observation is simply a vector with the value of the four state variables.

Similarly, the details of the MountainCar-v0 can be found in

[Moore90] A Moore, Efficient Memory-Based Learning for Robot Control, PhD thesis, University of Cambridge, 1990.

and so on.

like image 34
Pablo EM Avatar answered Oct 18 '22 03:10

Pablo EM