Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Initial Genetic Programming Parameters

I did a little GP (note:very little) work in college and have been playing around with it recently. My question is in regards to the intial run settings (population size, number of generations, min/max depth of trees, min/max depth of initial trees, percentages to use for different reproduction operations, etc.). What is the normal practice for setting these parameters? What papers/sites do people use as a good guide?

like image 550
cmptrer Avatar asked May 06 '10 15:05

cmptrer


People also ask

What are the parameters of genetic algorithm?

Genetic Algorithm programs include a number of parameters including the probabilities of crossover and mutation, the population size and the number of generations. A factorial experiment has been performed to identify appropriate values for these factors that produce the best results within a given execution time.

What are the parameters of GA?

There are two basic parameters of GA - crossover probability and mutation probability. Crossover probability: how often crossover will be performed. If there is no crossover, offspring are exact copies of parents. If there is crossover, offspring are made from parts of both parent's chromosome.

How many parameters does a genetic algorithm have?

There are two basic parameters of GA - crossover probability and mutation probability. Crossover probability says how often will be crossover performed. If there is no crossover, offspring is exact copy of parents.

How do you generate initial population in genetic algorithm?

It has been observed that the entire population should not be initialized using a heuristic, as it can result in the population having similar solutions and very little diversity. It has been experimentally observed that the random solutions are the ones to drive the population to optimality.


2 Answers

You'll find that this depends very much on your problem domain - in particular the nature of the fitness function, your implementation DSL etc.

Some personal experience:

  • Large population sizes seem to work better when you have a noisy fitness function, I think this is because the growth of sub-groups in the population over successive generations acts to give more sampling of the fitness function. I typically use 100 for less noisy/deterministic functions, 1000+ for noisy.
  • For number of generations it is best to measure improvements in the fitness function and stop when it meets your target criteria. I normally run a few hundred generations and see what kind of answers are coming out, if it is showing no improvement then you probably have an issue elsewhere.
  • Tree depth requirements are really dependent on your DSL. I sometimes try to do an implementation without explicit limits but penalise or eliminate programs that run too long (which is probably what you really care about....). I've also found total node counts of ~1000 to be quite useful hard limits.
  • Percentages for different mutation / recombination operators don't seem to matter all that much. As long as you have a comprehensive set of mutations, any reasonably balanced distribution will usually work. I think the reason for this is that you are basically doing a search for favourable improvements so the main objective is just to make sure the trial improvements are reasonably well distributed across all the possibilities.
like image 173
mikera Avatar answered Oct 16 '22 12:10

mikera


Why don't you try using a genetic algorithm to optimise these parameters for you? :)

Any problem in computer science can be solved with another layer of indirection (except for too many layers of indirection.)

-David J. Wheeler

like image 33
Drew Noakes Avatar answered Oct 16 '22 11:10

Drew Noakes