this time i'm facing a "design" problem. Using Python, I have a implement a mathematical algorithm which uses 5 parameters. To find the best combination of these 5 parameters, i used 5-layer nested loop to enumerate all possible combinations in a given range. The time it takes to finish appeared to be beyond my expectation. So I think it's the time to use multithreading...
The task in the core of nested loops are calculation and saving. In current code, result from every calculation is appended to a list and the list will be written to a file at the end of program.
since I don't have too much experience of multithreading in any language, not to mention Python, I would like to ask for some hints on what should the structure be for this problem. Namely, how should the calculations be assigned to the threads dynamically and how should the threads save results and later combine all results into one file. I hope the number of threads can be adjustable.
Any illustration with code will be very helpful.
thank you very much for your time, I appreciate it.
#update of 2nd Day: thanks for all helpful answers, now I know that it is multiprocessing instead of multithreading. I always confuse with these two concepts because I think if it is multithreaded then the OS will automatically use multiple processor to run it when available. I will find time to have some hands-on with multiprocessing tonight.
You can try using jug, a library I wrote for very similar problems. Your code would then look something like
from jug import TaskGenerator
evaluate = TaskGenerator(evaluate)
for p0 in [1,2,3]:
for p1 in xrange(10):
for p2 in xrange(10,20):
for p3 in [True, False]:
for p4 in xrange(100):
results.append(evaluate(p0,p1,p2,p3,p4))
Now you could run as many processes as you'd like (even across a network if you have access to a computer cluster).
Multithreading in Python won't win you anything in this kind of problem, since Python doesn't execute threads in parallel (it uses them for I/O concurrency, mostly).
You want multiprocessing
instead, or a friendly wrapper for it such as joblib:
from joblib import Parallel, delayed
# -1 == use all available processors
results = Parallel(n_jobs=-1)(delayed(evaluate)(x) for x in enum_combinations())
print best_of(results)
Where enum_combinations
would enumerate all combinations of your five parameters; you can likely implement it by putting a yield
at the bottom of your nested loop.
joblib distributes the combinations over multiple worker processes, taking care of some load balancing.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With