I want to know the specification of the observation of CartPole-v0
in OpenAI Gym(https://gym.openai.com/).
For example, in the following code outputs observation
. One observation is like [-0.061586 -0.75893141 0.05793238 1.15547541]
I want to know what the numbers mean. And I want any way to know the specification of other Environments
such as MountainCar-v0
, MsPacman-v0
and so on.
I tried to read https://github.com/openai/gym, but I don't know that. Would you tell me the way to know the specifications?
import gym
env = gym.make('CartPole-v0')
for i_episode in range(20):
observation = env.reset()
for t in range(100):
env.render()
print(observation)
action = env.action_space.sample()
observation, reward, done, info = env.step(action)
if done:
print("Episode finished after {} timesteps".format(t+1))
break
(from https://gym.openai.com/docs)
The output is the following
[-0.061586 -0.75893141 0.05793238 1.15547541]
[-0.07676463 -0.95475889 0.08104189 1.46574644]
[-0.0958598 -1.15077434 0.11035682 1.78260485]
[-0.11887529 -0.95705275 0.14600892 1.5261692 ]
[-0.13801635 -0.7639636 0.1765323 1.28239155]
[-0.15329562 -0.57147373 0.20218013 1.04977545]
Episode finished after 14 timesteps
[-0.02786724 0.00361763 -0.03938967 -0.01611184]
[-0.02779488 -0.19091794 -0.03971191 0.26388759]
[-0.03161324 0.00474768 -0.03443415 -0.04105167]
Our observation space is a continuous space of dimensions (210, 160, 3) corresponding to an RGB pixel observation of the same size. Our action space contains 4 discrete actions (Left, Right, Do Nothing, Fire)
obs = env.reset() Called at the start of each episode, this puts the environment into its starting state and returns the initial observation of the environment.
Wrapper that override how the environment processes observations, rewards, and action. The following three classes provide this functionality: gym. ObservationWrapper : Used to modify the observations returned by the environment. To do this, override the observation method of the environment.
Gym is an open source Python library for developing and comparing reinforcement learning algorithms by providing a standard API to communicate between learning algorithms and environments, as well as a standard set of environments compliant with that API.
The observation space used in OpenAI Gym is not exactly the same with the original paper. Look at OpenAI's wiki to find the answer. The observation space is a 4-D space, and each dimension is as follows:
Num Observation Min Max
0 Cart Position -2.4 2.4
1 Cart Velocity -Inf Inf
2 Pole Angle ~ -41.8° ~ 41.8°
3 Pole Velocity At Tip -Inf Inf
After the paragraph describing each environment in OpenAI Gym website, you always have a reference that explains in detail the environment, for example, in the case of CartPole-v0
you can find all details in:
[Barto83] AG Barto, RS Sutton and CW Anderson, "Neuronlike Adaptive Elements That Can Solve Difficult Learning Control Problem", IEEE Transactions on Systems, Man, and Cybernetics, 1983.
In that paper you can read that the cart-pole has four state variables:
So, the observation
is simply a vector with the value of the four state variables.
Similarly, the details of the MountainCar-v0
can be found in
[Moore90] A Moore, Efficient Memory-Based Learning for Robot Control, PhD thesis, University of Cambridge, 1990.
and so on.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With