Difference between OpenAI Gym environments 'CartPole-v0' and 'CartPole-v1'

Tags:

I can't find an exact description of the differences between the OpenAI Gym environments 'CartPole-v0' and 'CartPole-v1'.

Both environments have seperate official websites dedicated to them at (see 1 and 2), though I can only find one code without version identification in the gym github repository (see 3). I also checked out the what files exactly are loaded via the debugger, though they both seem to load the same aforementioned file. The only difference seems to be in the their internally assigned max_episode_steps and reward_threshold, which can be accessed as seen below. CartPole-v0 has the values 200/195.0 and CartPole-v1 has the values 500/475.0. The rest seems identical at first glance.

import gym

env = gym.make("CartPole-v1")
print(self.env.spec.max_episode_steps)
print(self.env.spec.reward_threshold)

I would therefore appreciate it if someone could describe the exact differences for me or forward me to a website that is doing so. Thank you very much!

953

asked Jul 05 '19 13:07

PaulOnStackoverflow

1 Answers

As you probably have noticed, in OpenAI Gym sometimes there are different versions of the same environments. The different versions usually share the main environment logic but some parameters are configured with different values. These versions are managed using a feature called the registry.

In the case of the CartPole environment, you can find the two registered versions in this source code. As you can see in lines 50 to 65, there exist two CartPole versions, tagged as v0 and v1, whose differences are the parameters max_episode_steps and reward_threshold:

register(
    id='CartPole-v0',
    entry_point='gym.envs.classic_control:CartPoleEnv',
    max_episode_steps=200,
    reward_threshold=195.0,
)

register(
    id='CartPole-v1',
    entry_point='gym.envs.classic_control:CartPoleEnv',
    max_episode_steps=500,
    reward_threshold=475.0,
)

Both parameters confirm your guess about the difference between CartPole-v0 and CartPole-v1.

167

answered Oct 18 '22 09:10

Pablo EM

Related questions
                            
                                ValueError: Input 0 is incompatible with layer conv1d_1: expected ndim=3, found ndim=4
                            
                                Summarizing a Wikipedia Article
                            
                                Custom cluster colors of SciPy dendrogram in Python (link_color_func?)
                            
                                Better text documents clustering than tf/idf and cosine similarity?
                            
                                How to evolve weights of a neural network in Neuroevolution?
                            
                                Implementing a linear, binary SVM (support vector machine)
                            
                                GBM R function: get variable importance separately for each class
                            
                                How do I make a U-matrix?
                            
                                Computing TF-IDF on the whole dataset or only on training data?
                            
                                What is the preferred ratio between the vocabulary size and embedding dimension?
                            
                                Is there any code or algorithm for signature recognition?
                            
                                How to penalize False Negatives more than False Positives
                            
                                multilayer_perceptron : ConvergenceWarning: Stochastic Optimizer: Maximum iterations reached and the optimization hasn't converged yet.Warning?
                            
                                Deep learning for image classification [closed]
                            
                                Why is Random Forest with a single tree much better than a Decision Tree classifier?
                            
                                Implementing dropout from scratch
                            
                                What does the value of 'leaf' in the following xgboost model tree diagram means?
                            
                                Why do we maximize variance during Principal Component Analysis?
                            
                                Proper way to feed time-series data to stateful LSTM?
                            
                                R: ggplot display all dates on x axis

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Difference between OpenAI Gym environments 'CartPole-v0' and 'CartPole-v1'

Tags:

machine-learning

reinforcement-learning

openai-gym

PaulOnStackoverflow

People also ask

1 Answers

Pablo EM

Recent Activity

Donate For Us