Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to implement custom environment in keras-rl / OpenAI GYM?

I'm a complete newbie to Reinforcement Learning and have been searching for a framework/module to easily navigate this treacherous terrain. In my search I've come across two modules keras-rl & OpenAI GYM.

I can get both of them two work on the examples they have shared on their WIKIs but they come with predefined environments and have little or no information on how to setup my own custom environment.

I would be really thankful if anyone could point me towards a tutorial or just explain it to me on how can i setup a non-game environment?

like image 644
Manipal King Avatar asked Jun 10 '17 03:06

Manipal King


People also ask

WHAT IS environment in OpenAI gym?

OpenAI gym is an environment for developing and testing learning agents. It is focused and best suited for reinforcement learning agent but does not restricts one to try other methods such as hard coded game solver / other deep learning approaches.

Does OpenAI gym use GPU?

This is because GPUs are used for policy training and not running the OpenAI Gym environment instances, thus they are not mandatory (although having a GPU node can assist the agent training by reducing training time).


1 Answers

I've been working on these libraries for some time and can share some of my experiments.

Let us first consider as an example of custom environment a text environment, https://github.com/openai/gym/blob/master/gym/envs/toy_text/hotter_colder.py

For a custom environment, a couple of things should be defined.

  1. Constructor__init__ method
  2. Action space
  3. Observation space (see https://github.com/openai/gym/tree/master/gym/spaces for all available gym spaces (it's a kind of data structure))
  4. _seed method (not sure that it's mandatory)
  5. _step method accepting action as a param and returning observation (state after action), reward (for transition to new observational state), done (boolean flag), and some optional additional info.
  6. _reset method that implements logic of fresh start of episode.

Optionally, you can create a _render method with something like

 def _render(self, mode='human', **kwargs):
        outfile = StringIO() if mode == 'ansi' else sys.stdout
        outfile.write('State: ' + repr(self.state) + ' Action: ' + repr(self.action_taken) + '\n')
        return outfile

And also, for better code flexibility, you can define logic of your reward in _get_reward method and changes to observation space from taking action in _take_action method.

like image 191
Andriy Lazorenko Avatar answered Sep 19 '22 11:09

Andriy Lazorenko