Openai gym environment for multi-agent games

Tags:

Is it possible to use openai's gym environments for multi-agent games? Specifically, I would like to model a card game with four players (agents). The player scoring a turn starts the next turn. How would I model the necessary coordination between the players (e.g. who's turn it is next)? Ultimately, I would like to use reinforcement learning on four agents that play against each other.

209

asked Jun 05 '17 13:06

Martin Studer

1 Answers

Yes, it is possible to use OpenAI gym environments for multi-agent games. Although in the OpenAI gym community there is no standardized interface for multi-agent environments, it is easy enough to build an OpenAI gym that supports this. For instance, in OpenAI's recent work on multi-agent particle environments they make a multi-agent environment that inherits from gym.Env which takes the following form:

class MultiAgentEnv(gym.Env):      def step(self, action_n):         obs_n    = list()         reward_n = list()         done_n   = list()         info_n   = {'n': []}         # ...         return obs_n, reward_n, done_n, info_n

We can see that the step function takes a list of actions (one for each agent) and returns a list of observations, list of rewards, list of dones, while stepping the environment forwards. This interface is representative of Markov Game, in which all agents take actions at the same time and each observe their own subsequent observation, reward.

However, this kind of Markov Game interface may not be suitable for all multi-agent environments. In particular, turn-based games (such as card games) might be better cast as an alternating Markov Game, in which agents take turns (i.e. actions) one at a time. For this kind of environment, you may need to include which agent's turn it is in the representation of state, and your step function would then just take a single action, and return a single observation, reward and done.

190

answered Sep 18 '22 18:09

Jon Deaton

Related questions
                            
                                Epsilon and learning rate decay in epsilon greedy q learning
                            
                                Why can't my DQN agent find the optimal policy in a non-deterministic environment?
                            
                                Reinforcement learning in C# [closed]
                            
                                How to use Tensorflow Optimizer without recomputing activations in reinforcement learning program that returns control after each iteration?
                            
                                EM score in SQuAD Challenge
                            
                                Pytorch ValueError: optimizer got an empty parameter list
                            
                                Can evolutionary computation be a method of reinforcement learning?
                            
                                Implementing the TD-Gammon algorithm
                            
                                Reinforcement Learning With Variable Actions
                            
                                Using Tensorflow Huber loss in Keras
                            
                                C++ Reinforcement Learning Library [closed]
                            
                                TypeError: len is not well defined for symbolic Tensors. (activation_3/Identity:0) Please call `x.shape` rather than `len(x)` for shape information
                            
                                How to update weights manually with Keras
                            
                                Display OpenAI gym in Jupyter notebook only
                            
                                How to effectively make use of a GPU for reinforcement learning?
                            
                                List all environment id in openai gym
                            
                                DQN - Q-Loss not converging
                            
                                Eligibility trace reinitialization between episodes in SARSA-Lambda implementation
                            
                                Difference between OpenAI Gym environments 'CartPole-v0' and 'CartPole-v1'
                            
                                Understanding Gradient Policy Deriving

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Openai gym environment for multi-agent games

Tags:

reinforcement-learning

openai-gym

Martin Studer

People also ask

1 Answers

Jon Deaton

Recent Activity

Donate For Us