I try to learn MC- Monte Carlo Method applied in blackjack using openAI Gym. And I do not understand these lines:
def __init__(self, natural=False):
self.action_space = spaces.Discrete(2)
self.observation_space = spaces.Tuple((
spaces.Discrete(32),
spaces.Discrete(11),
spaces.Discrete(2)))
self.seed()
Source from: https://github.com/openai/gym/blob/master/gym/envs/toy_text/blackjack.py
The basic structure of the environment is described by the observation_space and the action_space attributes of the Gym Env class. The observation_space defines the structure as well as the legitimate values for the observation of the state of the environment.
The MultiDiscrete action space allows controlling an agent with n-dimensional discrete action spaces. In my environment, I have 4 dimensions where each dimension has 11 actions. I'm trying to use A2C with a Softmax policy. Below is the implementation of the policy and value networks.
Wrapper that override how the environment processes observations, rewards, and action. The following three classes provide this functionality: gym. ObservationWrapper : Used to modify the observations returned by the environment. To do this, override the observation method of the environment.
OpenAI gym is an environment for developing and testing learning agents. It is focused and best suited for reinforcement learning agent but does not restricts one to try other methods such as hard coded game solver / other deep learning approaches.
The observation space and the action space has been defined in the comments here
Observation Space:
The observation of a 3-tuple of: the player's current sum,
the dealer's one showing card (1-10 where 1 is ace),
and whether or not the player holds a usable ace (0 or 1).
eg: (14, 9, False) means the current sum is 14, card shown is 9 and there is no usable ace(because ace can be used as 1 or 11)
Action Space:
The player can request additional cards (hit=1) until they decide to stop
(stick=0) or exceed 21 (bust).
Discrete spaces are used when we have a discrete action/observation space to be defined in the environment. So spaces.Discrete(2)
means that we have a discrete variable which can take one of the two possible values.
In the Blackjack environment,
self.action_space = spaces.Discrete(2)
# here spaces.Discrete(2) means that action can either be True or False.
self.observation_space = spaces.Tuple((
spaces.Discrete(32),
spaces.Discrete(11),
spaces.Discrete(2)))
# here spaces.Discrete(32) corresponds to the 32 possible sum of card number possible
# here spaces.Discrete(11) corresponds to the 11 possible cards which can be dealed
# by the dealer: [1,2,3,4,5,6,7,8,9,10(king,queen,jack),11(ace if possible)]
# here spaces.Discrete(2) corresponds to the two possible uses of the ace: [True, False]
# True if it can be used as 11.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With