I'm at work on a virtual machine which sits in my company's mainframe.
I have 4 cores assigned to work with so I'm trying to get into parallel processing of my Python code. I'm not familiar with it yet and I'm running into really unexpected behaviour, namely that multiprocessing/threading takes longer than single processing. I can't tell if I'm doing something wrong or if the problem comes from my virtual machine.
Here's an example:
import multiprocessing as mg
import threading
import math
import random
import time
NUM = 4
def benchmark():
for i in range(1000000):
math.exp(random.random())
threads = []
random.seed()
print "Linear Processing:"
time0 = time.time()
for i in range(NUM):
benchmark()
print time.time()-time0
print "Threading:"
for P in range(NUM):
threads.append(threading.Thread(target=benchmark))
time0 = time.time()
for t in threads:
t.start()
for t in threads:
t.join()
print time.time()-time0
threads = []
print "Multiprocessing:"
for i in range(NUM):
threads.append(mg.Process(target=benchmark))
time0 = time.time()
for t in threads:
t.start()
for t in threads:
t.join()
print time.time()-time0
The result from this is like this:
Linear Processing:
1.125
Threading:
4.56699991226
Multiprocessing:
3.79200005531
Linear processing is the fastest here which is the opposite of what I want and expected. I'm unsure about how the join statements should be executed, so I also did the example with the joins like this:
for t in threads:
t.start()
t.join()
Now this leads to output like this:
Linear Processing:
1.11500000954
Threading:
1.15300011635
Multiprocessing:
9.58800005913
Now threading is almost as fast as single processing, while multiprocessing is even slower.
When observing processor load in the task manager the individual load of the four virtual cores never rises over 30% even while doing the multiprocessing, so I'm suspecting a configurational problem here.
I want to know if I'm doing the benchmarking correctly and if that behaviour is really as strange as I think it is.
So, firstly, you're not doing anything wrong, and when I run your example on my Macbook Pro, with cPython 2.7.12, I get:
$ python test.py
Linear Processing:
0.733351945877
Threading:
1.20692706108
Multiprocessing:
0.256340026855
However, the difference becomes more apparent when I change:
for i in range(1000000):
To:
for i in range(100000000):
The difference is much more noticeable:
Linear Processing:
77.5861060619
Threading:
153.572453976
Multiprocessing:
33.5992660522
Now why is threading consistently slower? Because of the Global Interpreter Lock. The only thing the threading
module is good for is waiting on I/O. Your multiprocessing
example is the correct way to do this.
So, in your original example, where Linear Processing
was the fastest, I would blame this on the overhead of starting processes. When you're doing a small amount of work, it may often be the case that it takes more time to start 4 processes and wait for them to finish, than to just do the work synchronously in a single process. Use a larger workload to benchmark more realistically.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With