I am creating an algorithm that brute force searches for conditions of a 3x3 matrix using a generator object to create all possible combinations. Currently, the amount of time needed to run it on a single thread would take a massive amount of time, however I have access to a computer with many cores (64), so threading it to have at least 20 threads would be a very viable option.
However, I cannot simply convert the generator object to a list and split the list up into equal sized chunks. The amount of RAM required to store the list of lists is far too high.
My single threaded approach (simplified for the question) is as follows:
def permute(xs, low=0):
    if low + 1 >= len(xs):
        yield xs
    else:
        for p in permute(xs, low + 1):
            yield p     
        for i in range(low + 1, len(xs)):       
            xs[low], xs[i] = xs[i], xs[low]
            for p in permute(xs, low + 1):
                yield p     
            xs[low], xs[i] = xs[i], xs[low]
generator_obj = permute(range(9))
for l in generator_obj:
    search_conditions(l)
What would be a good approach to threading this?
Even if you have multiple threads, they will still be in the same process, which will only execute on a single core.
Rather than splitting the data into a fixed number of equal chunks, why not create a set of batches on the fly? For instance, you can
pickle, msgpack, or a databasesubprocess.Popen to process each batch and write the results back to disk This approach will use the power of your multi-core system, though some thought should be put into ensuring that the disk will not become a bottleneck.
Edit: I would try this -> http://www.dabeaz.com/coroutines/coprocess.py
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With