Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to interpret the observations of RAM environments in OpenAI gym?

In some OpenAI gym environments, there is a "ram" version. For example: Breakout-v0 and Breakout-ram-v0.

Using Breakout-ram-v0, each observation is an array of length 128.

Question: How can I transform an observation of Breakout-v0 (which is a 160 x 210 image) into the form of an observation of Breakout-ram-v0 (which is an array of length 128)?

My idea is to train a model on the Breakout-ram-v0 and display the trained model playing using the Breakout-v0 environment.

like image 685
Victor Avatar asked Jul 20 '17 07:07

Victor


2 Answers

There's a couple ways of understanding the ram option.

Let's say you wanted to learn pong. If you train from the pixels, you'll likely use a convolutional net of several layers. interestingly, the final output of the convnet is a a 1D array of features. These you pass to a fully connected layer and maybe output the correct 'action' based on the features the convnet recognized in the image(es). Or you might use a reinforcement layer working on the 1D array of features.

Now let's say it occurs to you that pong is very simple, and could probably be represented in a 16x16 image instead of 160x160. straight downsampling doesn't give you enough detail, so you use openCV to extract the position of the ball and paddles, and create your mini version of 16x16 pong. with nice, crisp pixels. The computation needed is way less than your deep net to represent the essence of the game, and your new convnet is nice and small. Then you realize you don't even need your convnet any more. you can just do a fully connected layer to each of your 16x16 pixels.

So, think of what you have. Now you have 2 different ways of getting a simple representation of the game, to train your fully-connected layer on. (or RL algo)

  1. your deep convnet goes through several layers and outputs a 1D array, say of 256 features in the final layer. you pass that to the fully connected layer.
  2. your manual feature extraction extracts the blobs (pattles/ball) with OpenCV, to make a 16x16 pong. by passing that to your fully connected layer, it's really just a set of 16x16=256 'extracted features'.

So the pattern is that you find a simple way to 'represent' the state of the game, then pass that to your fully connected layers.

Enter option 3. The RAM of the game may just be a 256 byte array. But you know this contains the 'state' of the game, so it's like your 16x16 version of pong. it's most likely a 'better' representation than your 16x16 because it probably has info about the direction of the ball etc.

So now you have 3 different ways to simplify the state of the game, in order to train your fully connected layer, or your reinforcment algorithm.

So, what OpenAI has done by giving you the RAM is helping you avoid the task of learning a 'representation' of the game, and that let's you move directly to learning a 'policy' or what to do based on the state of the game.

OpenAI may provide a way to 'see' the visual output on the ram version. If they don't, you could ask them to make that available. But that's the best you will get. They are not going to reverse engineer the code to 'render' the RAM, nor are they going to reverse engineer the code to 'generate' 'RAM' based on pixels, which is not actually possible, since pixels are only part of the state of the game.

They simply provide the ram if it's easily available to them, so that you can try algorithms that learn what to do assuming there is something giving them a good state representation.

There is no (easy) way to do what you asked, as in translate pixels to RAM, but most likely there is a way to ask the Atari system to give you both the ram, and the pixels, so you can work on ram but show pixels.

like image 134
AwokeKnowing Avatar answered Oct 03 '22 09:10

AwokeKnowing


My idea is to train a model on the Breakout-ram-v0 and display the trained model playing using the Breakout-v0 environment.

Similar to erosten's answer: If your environment is

env = gym.make('Breakout-ram-v0')
env.reset()

and you want pixels, you're looking for

pixels = env.unwrapped._get_image()
like image 42
Jacob Stern Avatar answered Oct 03 '22 07:10

Jacob Stern