I am getting to know OpenAI's GYM (0.25.1) using Python3.10 with gym's environment set to 'FrozenLake-v1 (code below).
According to the documentation, calling env.step() should return a tuple containing 4 values (observation, reward, done, info). However, when running my code accordingly, I get a ValueError:
Problematic code:
observation, reward, done, info = env.step(new_action)
Error:
3 new_action = env.action_space.sample()
----> 5 observation, reward, done, info = env.step(new_action)
7 # here's a look at what we get back
8 print(f"observation: {observation}, reward: {reward}, done: {done}, info: {info}")
ValueError: too many values to unpack (expected 4)
Adding one more variable fixes the error:
a, b, c, d, e = env.step(new_action)
print(a, b, c, d, e)
Output:
5 0 True True {'prob': 1.0}
My interpretation:
5 should be observation0 is rewardprob: 1.0 is infoTrue's is doneSo what's the leftover boolean standing for?
Thank you for your help!
Complete code:
import gym
env = gym.make('FrozenLake-v1', new_step_api=True, render_mode='ansi') # build environment
current_obs = env.reset() # start new episode
for e in env.render():
print(e)
new_action = env.action_space.sample() # random action
observation, reward, done, info = env.step(new_action) # perform action, ValueError!
for e in env.render():
print(e)
You may want to consider the new API for creating the env because a temporary wrapper support is provided for the old code and it may cease to be backward compatible some day. Using the new API could have certain minor ramifications to your code (in one line - Dont simply do: done = truncated).
Let us quickly understand the change.
To use the new API, add new_step_api=True option (note: with the latest API, the new_step_api option is not be needed) for e.g.
env = gym.make('MountainCar-v0', new_step_api=True)
This causes the env.step() method to return five items instead of four. What is this extra one?
This is done to remove the ambiguity in the done signal. done=True in the old API did not distinguish between the environment terminating & the episode truncating. This problem was avoided previously by setting info['TimeLimit.truncated'] in case of a timelimit through the TimeLimit wrapper. All that is not required now and the env.step() function returns us:
obs, reward, terminated, truncated , info = env.step(action)
How could this impact your code: If your game has some kind of max_steps or timeout, you should read the 'truncated' variable IN ADDITION to the 'terminated' variable to see if your game ended. Based on the kind of rewards that you have you may want to tweak things slightly. A simplest option could just be to do a
done = truncated or terminated
and then proceed to reuse your old code.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With