Let's say I want to paralelize some intensive computation (not I/O bound).
Naturally, I do not want to run more processes than available processors or I would start paying for context switching (and cache misses).
Mentally, I would expect that as I increased n
in multiprocessing.Pool(n)
, total time would behave like this:
But in actuality, I am getting this:
#!/usr/bin/env python
from math import factorial
def pi(n):
t = 0
pi = 0
deno = 0
k = 0
for k in range(n):
t = ((-1)**k)*(factorial(6*k))*(13591409+545140134*k)
deno = factorial(3*k)*(factorial(k)**3)*(640320**(3*k))
pi += t/deno
pi = pi * 12/(640320**(1.5))
pi = 1/pi
return pi
import multiprocessing
import time
maxx = 20
tasks = 60
task_complexity = 500
x = range(1, maxx+1)
y = [0]*maxx
for i in x:
p = multiprocessing.Pool(i)
tic = time.time()
p.map(pi, [task_complexity]*tasks)
toc = time.time()
y[i-1] = toc-tic
print '%2d %ds' % (i, y[i-1])
import matplotlib.pyplot as plot
plot.plot(x, y)
plot.xlabel('Number of threads')
plot.xlim(1, maxx)
plot.xticks(x)
plot.ylabel('Time in seconds')
plot.show()
My machine: i3-3217U CPU @ 1.80GHz × 4
Operating system: Ubuntu 14.04
After n>4, I see the task manager rotating through the various processes, as expected since there are more processes than processors. Yet, there is no penalty relative to n=4 (my number of processors).
In fact, even when n<4, I see the scheduler frenetically rotating the processes through my processors, instead of assigning each process to its own processor and avoid context switching.
I am seeing this behavior using gnome-system-monitor: (Please let me know if someone has a different experience.)
Any explanation why it does not seem to matter how many processes I fire? Or is something wrong with my code?
My guess: it seems to be the case that processes are not processor-bound (even when only two processes are active, they keep switching CPU), and so I am paying for context switching anyway.
References:
EDIT: updated graphic and code with higher constants.
In fact, even when n<4, I see the scheduler frenetically rotating the processes through my processors, instead of assigning each process to its own processor and avoid context switching.
Processes are not processor-bounded by default, one of the main reason being to avoid unequal heating of the processor, which can cause mechanical stress and reduce its lifetime.
There are ways to enforce running a process on a single core (look at psutil
module), which has advantages such as better use of cache memory and avoid context switching, but in most cases (if not all), you don't make a big difference in terms of performances.
So now if spawn more processes than your number of cores, they will just act as threads and switch between them to optimize execution. The processor performance will only be (very) slightly lowered, as you already were switching context with less than 4 processes.
Answering my own question:
Firstly, I seem to have committed an error in my post. It does not seem true that the CPU being used gets changed frantically. If I fire two CPU-intensive processes, they keep changing cores but only between two cores. My computer has 4 cores each of which has 2 "soft" cores (for hyperthreading). I guess what is going on is that it is changing between these 2 "soft" cores. It isn't Linux doing this, it is the CPU-board.
That being said, I am still surprised that context switching is not more of a pain than it is.
EDIT: There is a nice discussion, with better empirical work than me, over this blog.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With