Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

what is the neat way to divide huge nested loops to 8(or more) processes using Python?

this time i'm facing a "design" problem. Using Python, I have a implement a mathematical algorithm which uses 5 parameters. To find the best combination of these 5 parameters, i used 5-layer nested loop to enumerate all possible combinations in a given range. The time it takes to finish appeared to be beyond my expectation. So I think it's the time to use multithreading...

The task in the core of nested loops are calculation and saving. In current code, result from every calculation is appended to a list and the list will be written to a file at the end of program.

since I don't have too much experience of multithreading in any language, not to mention Python, I would like to ask for some hints on what should the structure be for this problem. Namely, how should the calculations be assigned to the threads dynamically and how should the threads save results and later combine all results into one file. I hope the number of threads can be adjustable.

Any illustration with code will be very helpful.

thank you very much for your time, I appreciate it.

#

update of 2nd Day: thanks for all helpful answers, now I know that it is multiprocessing instead of multithreading. I always confuse with these two concepts because I think if it is multithreaded then the OS will automatically use multiple processor to run it when available. I will find time to have some hands-on with multiprocessing tonight.

like image 590
Matt Avatar asked Jul 04 '11 22:07

Matt


2 Answers

You can try using jug, a library I wrote for very similar problems. Your code would then look something like

from jug import TaskGenerator
evaluate = TaskGenerator(evaluate)

for p0 in [1,2,3]:
    for p1 in xrange(10):
        for p2 in xrange(10,20):
             for p3 in [True, False]:
                for p4 in xrange(100):
                    results.append(evaluate(p0,p1,p2,p3,p4))

Now you could run as many processes as you'd like (even across a network if you have access to a computer cluster).

like image 54
luispedro Avatar answered Sep 25 '22 02:09

luispedro


Multithreading in Python won't win you anything in this kind of problem, since Python doesn't execute threads in parallel (it uses them for I/O concurrency, mostly).

You want multiprocessing instead, or a friendly wrapper for it such as joblib:

from joblib import Parallel, delayed

# -1 == use all available processors
results = Parallel(n_jobs=-1)(delayed(evaluate)(x) for x in enum_combinations())
print best_of(results)

Where enum_combinations would enumerate all combinations of your five parameters; you can likely implement it by putting a yield at the bottom of your nested loop.

joblib distributes the combinations over multiple worker processes, taking care of some load balancing.

like image 36
Fred Foo Avatar answered Sep 24 '22 02:09

Fred Foo