Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

OpenAI GYM's env.step(): what are the values?

I am getting to know OpenAI's GYM (0.25.1) using Python3.10 with gym's environment set to 'FrozenLake-v1 (code below).

According to the documentation, calling env.step() should return a tuple containing 4 values (observation, reward, done, info). However, when running my code accordingly, I get a ValueError:

Problematic code:

observation, reward, done, info = env.step(new_action)

Error:

      3 new_action = env.action_space.sample()
----> 5 observation, reward, done, info = env.step(new_action)
      7 # here's a look at what we get back
      8 print(f"observation: {observation}, reward: {reward}, done: {done}, info: {info}")

ValueError: too many values to unpack (expected 4)

Adding one more variable fixes the error:

a, b, c, d, e = env.step(new_action)
print(a, b, c, d, e)

Output:

5 0 True True {'prob': 1.0}

My interpretation:

  • 5 should be observation
  • 0 is reward
  • prob: 1.0 is info
  • One of the True's is done

So what's the leftover boolean standing for?

Thank you for your help!


Complete code:

import gym

env = gym.make('FrozenLake-v1', new_step_api=True, render_mode='ansi') # build environment

current_obs = env.reset() # start new episode

for e in env.render():
    print(e)
    
new_action = env.action_space.sample() # random action

observation, reward, done, info = env.step(new_action) # perform action, ValueError!

for e in env.render():
    print(e)
like image 517
doesnotcompile Avatar asked Dec 03 '25 15:12

doesnotcompile


1 Answers

You may want to consider the new API for creating the env because a temporary wrapper support is provided for the old code and it may cease to be backward compatible some day. Using the new API could have certain minor ramifications to your code (in one line - Dont simply do: done = truncated).

Let us quickly understand the change.

To use the new API, add new_step_api=True option (note: with the latest API, the new_step_api option is not be needed) for e.g.

env = gym.make('MountainCar-v0', new_step_api=True)

This causes the env.step() method to return five items instead of four. What is this extra one?

  • Well, in the old API - done was returned as True if episode ends in any way.
  • In the new API, done is split into 2 parts:
  • terminated=True if environment terminates (eg. due to task completion, failure etc.)
  • truncated=True if episode truncates due to a time limit or a reason that is not defined as part of the task MDP.

This is done to remove the ambiguity in the done signal. done=True in the old API did not distinguish between the environment terminating & the episode truncating. This problem was avoided previously by setting info['TimeLimit.truncated'] in case of a timelimit through the TimeLimit wrapper. All that is not required now and the env.step() function returns us:

obs, reward, terminated, truncated , info = env.step(action)

How could this impact your code: If your game has some kind of max_steps or timeout, you should read the 'truncated' variable IN ADDITION to the 'terminated' variable to see if your game ended. Based on the kind of rewards that you have you may want to tweak things slightly. A simplest option could just be to do a

done = truncated or terminated 

and then proceed to reuse your old code.

like image 135
Allohvk Avatar answered Dec 06 '25 06:12

Allohvk