Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Code generation by genetic algorithms

Evolutionary programming seems to be a great way to solve many optimization problems. The idea is very easy and the implementation does not make problems.

I was wondering if there is any way to evolutionarily create a program in ruby/python script (or any other language)?

The idea is simple:

  1. Create a population of programs
  2. Perform genetic operations (roulette-wheel selection or any other selection), create new programs with inheritance from best programs, etc.
  3. Loop point 2 until program that will satisfy our condition is found

But there are still few problems:

  1. How will chromosomes be represented? For example, should one cell of chromosome be one line of code?
  2. How will chromosomes be generated? If they will be lines of code, how do we generate them to ensure that they are syntactically correct, etc.?

Example of a program that could be generated:

Create script that takes N numbers as input and returns their mean as output.

If there were any attempts to create such algorithms I'll be glad to see any links/sources.

like image 371
dfens Avatar asked Apr 20 '11 15:04

dfens


People also ask

What is coding in genetic algorithm?

Coding is to map the phenotype data in solution space into genotype data in genetic structure. During iterations of GA, a coding string represents a solution and genetic operations are done by operating the bits of this string. So, the coding method also affects the genetic operators.

What is generation cycle in genetic algorithm?

There are many different methods of initializing populations, but with Genetic Algorithms the most popular method of initialization is simply to create a population of randomly initialized binary strings. Once the initial population has been created, the evolutionary generational cycle begins.

What is genetic algorithm in Python?

The genetic algorithm is a search-based optimization technique. It is frequently used to find the optimal or nearest optimal solution. It was introduced by John Holland. It is based on Darwins Natural Selection Theory.


2 Answers

If you are sure you want to do this, you want genetic programming, rather than a genetic algorithm. GP allows you to evolve tree-structured programs. What you would do would be to give it a bunch of primitive operations (while($register), read($register), increment($register), decrement($register), divide($result $numerator $denominator), print, progn2 (this is GP speak for "execute two commands sequentially")).

You could produce something like this:

progn2(   progn2(     read($1)     while($1       progn2(         while($1           progn2( #add the input to the total             increment($2)             decrement($1)           )         )         progn2( #increment number of values entered, read again           increment($3)           read($1)         )       )     )   )   progn2( #calculate result     divide($1 $2 $3)     print($1)   ) )   

You would use, as your fitness function, how close it is to the real solution. And therein lies the catch, that you have to calculate that traditionally anyway*. And then have something that translates that into code in (your language of choice). Note that, as you've got a potential infinite loop in there, you'll have to cut off execution after a while (there's no way around the halting problem), and it probably won't work. Shucks. Note also, that my provided code will attempt to divide by zero.

*There are ways around this, but generally not terribly far around it.

like image 196
Joel Rein Avatar answered Sep 30 '22 07:09

Joel Rein


It can be done, but works very badly for most kinds of applications.

Genetic algorithms only work when the fitness function is continuous, i.e. you can determine which candidates in your current population are closer to the solution than others, because only then you'll get improvements from one generation to the next. I learned this the hard way when I had a genetic algorithm with one strongly-weighted non-continuous component in my fitness function. It dominated all others and because it was non-continuous, there was no gradual advancement towards greater fitness because candidates that were almost correct in that aspect were not considered more fit than ones that were completely incorrect.

Unfortunately, program correctness is utterly non-continuous. Is a program that stops with error X on line A better than one that stops with error Y on line B? Your program could be one character away from being correct, and still abort with an error, while one that returns a constant hardcoded result can at least pass one test.

And that's not even touching on the matter of the code itself being non-continuous under modifications...

like image 42
Michael Borgwardt Avatar answered Sep 30 '22 08:09

Michael Borgwardt