Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What does spaces.Discrete mean in OpenAI Gym

I try to learn MC- Monte Carlo Method applied in blackjack using openAI Gym. And I do not understand these lines:

def __init__(self, natural=False):
    self.action_space = spaces.Discrete(2)
    self.observation_space = spaces.Tuple((
        spaces.Discrete(32),
        spaces.Discrete(11),
        spaces.Discrete(2)))
    self.seed()

Source from: https://github.com/openai/gym/blob/master/gym/envs/toy_text/blackjack.py

like image 344
doob Avatar asked Aug 21 '19 01:08

doob


People also ask

What is observation space in OpenAI gym?

The basic structure of the environment is described by the observation_space and the action_space attributes of the Gym Env class. The observation_space defines the structure as well as the legitimate values for the observation of the state of the environment.

What is multi discrete action space?

The MultiDiscrete action space allows controlling an agent with n-dimensional discrete action spaces. In my environment, I have 4 dimensions where each dimension has 11 actions. I'm trying to use A2C with a Softmax policy. Below is the implementation of the policy and value networks.

What is a gym wrapper?

Wrapper that override how the environment processes observations, rewards, and action. The following three classes provide this functionality: gym. ObservationWrapper : Used to modify the observations returned by the environment. To do this, override the observation method of the environment.

What does OpenAI gym do?

OpenAI gym is an environment for developing and testing learning agents. It is focused and best suited for reinforcement learning agent but does not restricts one to try other methods such as hard coded game solver / other deep learning approaches.


1 Answers

The observation space and the action space has been defined in the comments here

Observation Space:

The observation of a 3-tuple of: the player's current sum,
the dealer's one showing card (1-10 where 1 is ace),
and whether or not the player holds a usable ace (0 or 1).

eg: (14, 9, False) means the current sum is 14, card shown is 9 and there is no usable ace(because ace can be used as 1 or 11)

Action Space:

The player can request additional cards (hit=1) until they decide to stop
(stick=0) or exceed 21 (bust).

Discrete spaces are used when we have a discrete action/observation space to be defined in the environment. So spaces.Discrete(2) means that we have a discrete variable which can take one of the two possible values.

In the Blackjack environment,

self.action_space = spaces.Discrete(2)
# here spaces.Discrete(2) means that action can either be True or False.

self.observation_space = spaces.Tuple((
        spaces.Discrete(32),
        spaces.Discrete(11),
        spaces.Discrete(2)))
# here spaces.Discrete(32) corresponds to the 32 possible sum of card number possible
# here spaces.Discrete(11) corresponds to the 11 possible cards which can be dealed
# by the dealer: [1,2,3,4,5,6,7,8,9,10(king,queen,jack),11(ace if possible)]
# here spaces.Discrete(2) corresponds to the two possible uses of the ace: [True, False]
# True if it can be used as 11.

like image 121
nsidn98 Avatar answered Sep 27 '22 22:09

nsidn98