Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Genetic Algorithm - Unordered variable length chromosomes - Crossover strategies?

I'm working on a genetic algorithm. The chromosomes are not ordered - meaning the order in which they appear in a member does not affect that members score. Also the number of chromosomes are not fixed. One member might have 1 chromosome, another may have over 100.

I'm working in Python and the chromosomes are stored in lists. Below is a simplified example of the structure:

member = [{"key1":"value","key2":"value"},{"key1":"value","key2":"value"},{"key1":"value","key2":"value"}]

Two example members (simplified) might be:

member1 = [{"a":1.5,"b":2.334563},{"a":769.0003413,"b":0.00023}]
member2 = [{"a":7,"b":432.993246927},{"a":99,"b":532.234},{"a":21,"b":712.2},{"a":432,"b":999.9999},{"a":932,"b":12}]

Repeating chromosomes that are out of order is OK in the application:

member3 = [{"a":1,"b":1},{"a":2,"b":2},{"a":2,"b":2},{"a":1,"b":1}]

Members

Each chromosome in a member is a mathematical function that takes a Unix epoc timestamp as an input and it outputs a value. This allows me to get 'the value' for any time using that member's function. The keys in the chromosomes are always the same - but the values are randomly generated during initial seeding values range from 0 to 100 with up to 100 decimal places.

Grading system

I'm grading the functions against real time-series data I have in an SQL database. The time-series data is being constantly updated with new values every 1 to 3 seconds. When I do a select on this data, I select where the epoc value is greater than current epoc - 5 seconds and order by descending, and limit the output to 1 row. I take that actual epoc value returned and it's one of the points I grade against.

I take all the points (epoc:value pairs) and use those to grade, feeding the member function the epoc, getting the member's value, and then subtracting that value from the real value - and take the absolute value of that.

It looks somewhat like this:

total = 0
for chromosome in member[chromosomes]:
    for epoc in epocs:
        thisValue = Calc(epoc,chromosome)
        total = total + abs(thisValue - getRealValue(epoc))

The function Calc takes the chromosome and the epoc value and outputs a float.

Zero is a perfect score. The higher the score, the worse the member. I average all the members scores and remove ones below average.

I've tried grading against a static set of data from the database, and I've tried dynamically grading against the last 24 hours - meaning the last 24 hours is always different as time flows. I've also tried the last 4 hours, last hour, and last 3 days.

Mutation System

I have set my mutation rate at 2%, but I've played around at higher percents with worse results. Only children are potentially mutated, not existing population (want to keep the elites). When a child is selected for mutation, it's values in it's chromosomes are randomly shifted by (addition or subtraction randomly) a decimal between 0 and 1 of up to 100 decimal places. This provides the slightest of change to the child's values - because a very slight change drastically alters the output of a chromosome's function.

My problem

The crossover methods I'm using right now result in premature convergence.

Crossover strategies I've tried

I've tried taking a random number of random chromosomes from each parent. I've tried taking the first half of the first parent and last half of the second parent. I've tried these methods so far:

# Number of chromosomes from parent 1.
parent1chromosomes = randomNumber(0,len(parent1['chromosomes']))

# Number of chromosomes from parent 2.
parent2chromosomes = randomNumber(0,len(parent2['chromosomes']))

child = {}
child['chromosomes'] = []

# Get parent 1 chromosomes into child.
for i in range(0,parent1chromosomes):
    child['chromosomes'].append(random.choice(parent1['chromosomes']))

# Get parent 2 chromosomes into child.
for i in range(0,parent2chromosomes):
    child['chromosomes'].append(random.choice(parent2['chromosomes']))

Note: randomNumber is a function that returns a random integer between the specified ranges.

Both attempts result in early convergence. The problem I'm trying to solve is extremely complex - I've tried population sizes from 10,000 to 1,000,000 so far.

Example performance

Here's a screen-shot of a recent run. I'm mapping the best scores (lowest) and the average scores. In this photo, it's graphing five different population's best and averages over time. These particular 5 populations are each 10,000 members using a 3 second sampling of real data, and are scoring against the last hour of real data dynamically - which is why the best gets worse - because the real data it's being graded against changed in a way that makes the best member worse. The best scores are off by thousands, which is radically inaccurate. Smaller populations result in faster early convergence.

enter image description here

My question

What are other ways to better handle crossover with variable length members where the order of the chromosomes do not matter and repeating chromosomes do not matter?

like image 573
Wayne Workman Avatar asked Oct 24 '25 00:10

Wayne Workman


1 Answers

Rather than:

for i in range(0,parent1chromosomes):
    child['chromosomes'].append(random.choice(parent1['chromosomes']))

Perhaps:

child['chromosomes'].extend(random.sample(parent1['chromosomes'], parent1chromosomes))

This would mean that you can only get duplicate chromosomes if you get them both from one parent or you get one copy from both parents.

like image 57
Dan D. Avatar answered Oct 25 '25 15:10

Dan D.