Today, when I was trying to implement an rl-agent under the environment openai-gym, I found a problem that it seemed that all agents are trained from the most initial state: env.reset()
, i.e.
import gym
env = gym.make("CartPole-v0")
initial_observation = env.reset() # <-- Note
done = False
while not done:
action = env.action_space.sample()
next_observation, reward, done, info = env.step(action)
env.close() # close the environment
So it is natural that the agent can behave down the route env.reset() -(action)-> next_state -(action)-> next_state -(action)-> ... -(action)-> done
, this is an episode. But how can an agent start from a sepecific state like a middle state, then take an action from that state? For example, I sample an experience from the replay buffer, i.e. (s, a, r, ns, done)
, what if I want train the agent start directly from the state ns
, and get an action with a Q-Network
, then for an n-step
steps forward. Something like that:
import gym
env = gym.make("CartPole-v0")
initial_observation = ns # not env.reset()
done = False
while not done:
action = DQN(ns)
next_observation, reward, done, info = env.step(action)
# n-step later or done is true, break
env.close() # close the environment
But even though I set a variable initial_observation
as ns
, I think the agent or the env
will not aware it at all. How can I tell the gym.env
that I want set the initial observation as ns
and let the agent know the specific start state, get continue train directly from that specific observation(get start with that specific environment)?
AFAIK, the current implementation of most OpenAI gym envs (including the CartPole-v0 you have used in your question) doesn't implement any mechanism to init the environment in a given state.
However, it shouldn't be too complex to modify the CartPoleEnv.reset()
method in order to accept an optional parameter that acts as initial state.
I recommend you to use and adapt the following code to your needs, it works well and I used it in my AlphaZero implementation.
This example is for CartPole but you should be able to adapt it easily to other envs.
from copy import deepcopy
import gym
import numpy as np
from gym.spaces import Discrete, Dict, Box
class CartPole:
def __init__(self, config=None):
self.env = gym.make("CartPole-v0")
self.action_space = Discrete(2)
self.observation_space = self.env.observation_space
def reset(self):
return self.env.reset()
def step(self, action):
obs, rew, done, info = self.env.step(action)
return obs, rew, done, info
def set_state(self, state):
self.env = deepcopy(state)
obs = np.array(list(self.env.unwrapped.state))
return obs
def get_state(self):
return deepcopy(self.env)
def render(self):
self.env.render()
def close(self):
self.env.close()
The reason why a direct assignment to env.state
is not working, is because the gym environment generated is actually a gym.wrappers.TimeLimit
object.
To achieve what you intended, you have to also assign the ns
value to the unwrapped environment. So, something like this should do the trick:
env.reset()
env.state = env.unwrapped.state = ns
I would you suggest you extend the CartPole environment so the reset method does what you need. Then wrap your environment yourself. e.g.
from gym.envs.classic_control import CartPoleEnv
class ExtendedCartPoleEnv(CartPoleEnv):
def reset(self):
self.state = your_very_special_method()
self.steps_beyond_done = None
return np.array(self.state, dtype=np.float32)
max_episode_steps = 200
env = ExtendedCartPoleEnv()
env = TimeLimit(env, max_episode_steps)
I've just tweaked the original method found here.
You can also extend the original environment to change the behavior of self.reset
to take an argument, but this is not the standard. The wrapped environment wouldn't take the argument and then you would need to call env.unwrapped.reset
directly. This gets ugly because then env.step
will complain that env.reset
has not been called. etc. There are ways to make it happen, but then again, this diverges from what a regular gym environment is supposed to look like.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With