I have been trying to use the python multiprocessing package to speed up some physics simulations I'm doing by taking advantage of the multiple cores of my computer.
I noticed that when I run my simulation at most 3 of the 12 cores are used. In fact, when I start the simulation it initially uses 3 of the cores, and then after a while it goes to 1 core. Sometimes only one or two cores are used from the start. I have not been able to figure out why (I basically change nothing, except closing a few terminal windows (without any active processes)). (The OS is Red Hat Enterprise Linux 6.0, Python version is 2.6.5.)
I experimented by varying the number of chunks (between 2 and 120) into which the work is split (i.e. the number of processes that are created), but this seems to have no effect.
I looked for info about this problem online and read through most of the related questions on this site (e.g. one, two) but could not find a solution.
(Edit: I just tried running the code under Windows 7 and it's using all available cores alright. I still want to fix this for the RHEL, though.)
Here's my code (with the physics left out):
from multiprocessing import Queue, Process, current_process
def f(q,start,end): #a dummy function to be passed as target to Process
q.put(mc_sim(start,end))
def mc_sim(start,end): #this is where the 'physics' is
p=current_process()
print "starting", p.name, p.pid
sum_=0
for i in xrange(start,end):
sum_+=i
print "exiting", p.name, p.pid
return sum_
def main():
NP=0 #number of processes
total_steps=10**8
chunk=total_steps/10
start=0
queue=Queue()
subprocesses=[]
while start<total_steps:
p=Process(target=f,args=(queue,start,start+chunk))
NP+=1
print 'delegated %s:%s to subprocess %s' % (start, start+chunk, NP)
p.start()
start+=chunk
subprocesses.append(p)
total=0
for i in xrange(NP):
total+=queue.get()
print "total is", total
#two lines for consistency check:
# alt_total=mc_sim(0,total_steps)
# print "alternative total is", alt_total
while subprocesses:
subprocesses.pop().join()
if __name__=='__main__':
main()
(In fact the code is based on Alex Martelli's answer here.)
Edit 2: eventually the problem resolved itself without me understanding how. I did not change the code nor am I aware of having changed anything related to the OS. In spite of that, now all cores are used when I run the code. Perhaps the problem will reappear later on, but for now I choose to not investigate further, as it works. Thanks to everyone for the help.
Python processes typically use a single thread because of the GIL. Despite the GIL, libraries that perform computationally heavy tasks like numpy, scipy and pytorch utilise C-based implementations under the hood, allowing the use of multiple cores.
Meanwhile, you can get some of the benefits of multiprocessing without multiple cores. The main benefit—the reason the module was designed—is parallelism for speed. And obviously, without 4 cores, you aren't going to cut your time down to 25%.
Yes, it is. From https://docs.python.org/3/library/multiprocessing.html#exchanging-objects-between-processes: Queues are thread and process safe.
I ran Your example on Ubuntu 12.04 x64 (kernel 3.2.0-32-generic)
with Python version 2.7.3 x64
on i7
processor and all 8 cores reported by system were fully overload (based on htop
observation), so Your problem, Sir, is based on OS implementation, and code is good.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With