I am trying to get reproducible results with the genetic programming code in chapter 11 of "Programming Collective Intelligence" by Toby Segaran. However, simply setting seed "random.seed(55)" does not appear to work, changing the original code "from random import ...." to "import random" doesn't help, nor does changing Random(). These all seem to do approximately the same thing, the trees start out building the same, then diverge.
In reading various entries about the behavior of random, I can find no reason, given his GP code, why this divergence should happen. There doesn't appear to be anything in the code except calls to random, that has any variability that would account for this behavior. My understanding is that calling random.seed() should set all the calls correctly and since the code isn't threaded at all, I'm not sure how or why the divergence is happening.
Has anyone modified this code to behave reproducibly? Is there some form of calling random.seed() that may work better?
I apologize for not posting an example, but the code is obviously not mine (I'm adding only the call to seed and changing how random is called in the code) and this doesn't appear to be a simple issue with random (I've read all the entries on Python random here and many on the web in general).
Thanks. Mark L.
I had the same problem just now with some completely unrelated code. I believe my solution was similar to that in eryksun's answer, though I didn't have any trees. What I did have were some sets, and I was doing random.choice(list(set))
to pick values from them. Sometimes my results (the items picked) were diverging even with the same seed each time and I was close to pulling my hair out. After seeing eryksun's answer here I tried random.choice(sorted(set))
instead, and the problem appears to have disappeared. I don't know enough about the inner workings of Python to explain it.
This may help, to create a random object that won't be interfered with from elsewhere:
from random import Random
random = Random(55)
# random can be used like the plain module
If other libraries are calling random.seed
for any reason, they won't affect the random object you've created for your program.
I added the following function to gp.py, changing nothing else:
def set_seed(n):
import random
random.seed(n)
I'm using the module based on the example on page 267 (Google books). I can confirm that I get divergent results for the following trial:
>>> import gp
>>> gp.set_seed(55)
>>> rf = gp.getrankfunction(gp.buildhiddenset())
>>> gp.evolve(2, 500, rf, mutationrate=0.2, breedingrate=0.1, pexp=0.7, pnew=0.1)
It starts to diverge as early as the 4th value printed. I'm restarting the interpreter between trials, so it's not any prior state that's causing the problem.
Edit:
I found the random element. It's the memory address of the trees. The rank function sorts the list of results, for which each item is a tuple of the score and tree. Between runs the addresses change, and so the relative sort order of trees with equal score isn't a constant. Fortunately Python has a stable sort, so the fix is simple enough. Just use a sort key to sort based on only the score:
def getrankfunction(dataset):
def rankfunction(population):
scores=[(scorefunction(t, dataset), t) for t in population]
scores.sort(key=lambda x: x[0])
return scores
return rankfunction
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With