Boosting my GA with Neural Networks and/or Reinforcement Learning

Tags:

As I have mentioned in previous questions I am writing a maze solving application to help me learn about more theoretical CS subjects, after some trouble I've got a Genetic Algorithm working that can evolve a set of rules (handled by boolean values) in order to find a good solution through a maze.

That being said, the GA alone is okay, but I'd like to beef it up with a Neural Network, even though I have no real working knowledge of Neural Networks (no formal theoretical CS education). After doing a bit of reading on the subject I found that a Neural Network could be used to train a genome in order to improve results. Let's say I have a genome (group of genes), such as

1 0 0 1 0 1 0 1 0 1 1 1 0 0...

How could I use a Neural Network (I'm assuming MLP?) to train and improve my genome?

In addition to this as I know nothing about Neural Networks I've been looking into implementing some form of Reinforcement Learning, using my maze matrix (2 dimensional array), although I'm a bit stuck on what the following algorithm wants from me:

(from http://people.revoledu.com/kardi/tutorial/ReinforcementLearning/Q-Learning-Algorithm.htm)

1.  Set parameter , and environment reward matrix R
   2. Initialize matrix Q as zero matrix
   3. For each episode:
          * Select random initial state
          * Do while not reach goal state
                o Select one among all possible actions for the current state
                o Using this possible action, consider to go to the next state
                o Get maximum Q value of this next state based on all possible actions
                o Compute
                o Set the next state as the current state

  End Do

  End For

The big problem for me is implementing a reward matrix R and what a Q matrix exactly is, and getting the Q value. I use a multi-dimensional array for my maze and enum states for every move. How would this be used in a Q-Learning algorithm?

If someone could help out by explaining what I would need to do to implement the following, preferably in Java although C# would be nice too, possibly with some source code examples it'd be appreciated.

774

asked Mar 17 '10 14:03

AlexT

2 Answers

As noted in some comments, your question indeed involves a large set of background knowledge and topics that hardly can be eloquently covered on stackoverflow. However, what we can try here is suggest approaches to get around your problem.

First of all: what does your GA do? I see a set of binary values; what are they? I see them as either:

bad: a sequence of 'turn right' and 'turn left' instructions. Why is this bad? Because you're basically doing a random, brute-force attempt at solving your problem. You're not evolving a genotype: you're refining random guesses.
better: every gene (location in the genome) represents a feature that will be expressed in the phenotype. There should not be a 1-to-1 mapping between genome and phenotype!

Let me give you an example: in our brain there are 10^13ish neurons. But we have only around 10^9 genes (yes, it's not an exact value, bear with me for a second). What does this tell us? That our genotype does not encode every neuron. Our genome encodes the proteins that will then go and make the components of our body.

Hence, evolution works on the genotype directly by selecting features of the phenotype. If I were to have 6 fingers on each hand and if that would made me a better programmer, making me have more kids because I'm more successful in life, well, my genotype would then be selected by evolution because it contains the capability to give me a more fit body (yes, there is a pun there, given the average geekiness-to-reproducibily ratio of most people around here).

Now, think about your GA: what is that you are trying to accomplish? Are you sure that evolving rules would help? In other words -- how would you perform in a maze? What is the most successful thing that can help you: having a different body, or having a memory of the right path to get out? Perhaps you might want to reconsider your genotype and have it encode memorization abilities. Maybe encode in the genotype how much data can be stored, and how fast can your agents access it -- then measure fitness in terms of how fast they get out of the maze. Another (weaker) approach could be to encode the rules that your agent uses to decide where to go. The take-home message is, encode features that, once expressed, can be selected by fitness.

Now, to the neural network issue. One thing to remember is that NNs are filters. They receive an input. perform operations on it, and return an output. What is this output? Maybe you just need to discriminate a true/false condition; for example, once you feed a maze map to a NN, it can tell you if you can get out from the maze or not. How would you do such a thing? You will need to encode the data properly.

This is the key point about NNs: your input data must be encoded properly. Usually people normalize it, maybe scale it, perhaps you can apply a sigma function to it to avoid values that are too large or too small; those are details that deal with error measures and performance. What you need to understand now is what a NN is, and what you cannot use it for.

To your problem now. You mentioned you want to use NNs as well: what about,

using a neural network to guide the agent, and
using a genetic algorithm to evolve the neural network parameters?

Rephrased like so:

let's suppose you have a robot: your NN is controlling the left and right wheel, and as input it receives the distance of the next wall and how much it has traveled so far (it's just an example)
you start by generating a random genotype
make the genotype into a phenotype: the first gene is the network sensitivity; the second gene encodes the learning ratio; the third gene.. so on and so forth
now that you have a neural network, run the simulation
see how it performs
generate a second random genotype, evolve second NN
see how this second individual performs
get the best individual, then either mutate its genotype or recombinate it with the loser
repeat

there is an excellent reading on the matter here: Inman Harvey Microbial GA.

I hope I did you some insight on such issues. NNs and GA are no silver bullet to solve all problems. In some they can do very much, in others they are just the wrong tool. It's (still!) up to us to get the best one, and to do so we must understand them well.

Have fun in it! It's great to know such things, makes everyday life a bit more entertaining :)

117

answered Oct 15 '22 15:10

lorenzog

There is probably no 'maze gene' to find,

genetic algorithms are trying to setup a vector of properties and a 'filtering system' to decide by some kind of 'surival of the fittest' algorithm to find out which set of properties would do the best job.

The easiest way to find a way out of a maze is to move always left (or right) along a wall.

The Q-Algorithm seems to have a problem with local maxima this was workaround as I remember by kicking (adding random values to the matrix) if the results didn't improve.

EDIT: As mentioned above a backtracking algorithm suits this task better than GA or NN. How to combine both algorithm is described here NeuroGen descibes how GA is used for training a NN.

answered Oct 15 '22 14:10

stacker

Related questions
                            
                                How to fix "Duplicate class com.google.android.gms.common.api.internal.zza" error?
                            
                                Adding clippath information to an image
                            
                                Implementing custom intermediate operations on Java 8 Streams
                            
                                How to launch Informix via testcontainers?
                            
                                Tomcat process time is small but nginx shows it is big
                            
                                Package lombok does not exist
                            
                                How to disable logout confirmation in spring security using xml?
                            
                                FactoryFinder performance/bad caching
                            
                                Oracle JDBC Optimization : Enable PreparedStatement caching in a Spring boot Application
                            
                                Screenshot robot only captures a black screen on Debian
                            
                                Firebase crashlytics dashboard keeps Loading [closed]
                            
                                For loop performance: counters with same value vs. different values
                            
                                Is it possible to solve the multithreaded bank account problem fast without locks?
                            
                                How should exceptions be planned at the architectural level?
                            
                                getting boolean properties from objects in jsp el
                            
                                What are the likely pain points to upgrade an application to Spring 3.0
                            
                                Zombie threads eating my brainz (J2EE, Tomcat, Hibernate, Quartz)
                            
                                security problem with Java ScriptEngine
                            
                                Android - Reading ID3 tags from mp3 stream
                            
                                Domain Model Financial Trading application

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Boosting my GA with Neural Networks and/or Reinforcement Learning

Tags:

java

computer-science

artificial-intelligence

neural-network

theory

AlexT

People also ask

2 Answers

lorenzog

stacker

Recent Activity

Donate For Us