I am trying to get my head around threading vs. CPU usage. There are plenty of discussions about threading vs. multiprocessing (a good overview being this answer) so I decided to test this out by launching a maximum number of threads on my 8 CPU laptop running Windows 10, Python 3.4.
My assumption was that all the threads would be bound to a single CPU.
EDIT: it turns out that it was not a good assumption. I now understand that for multithreaded code, only one piece of python code can run at once (no matter where/on which core). This is different for multiprocessing code (where processes are independent and run indeed independently).
While I read about these differences, it is one answer which actually clarified this point.
I think it also explains the CPU view below: that it is an average view of many threads spread out on many CPUs, but only one of them running at one given time (which "averages" to all of them running all the time).
It is not a duplicate of the linked question (which addresses the opposite problem, i.e. all threads on one core) and I will leave it hanging in case someone has a similar question one day and is hopefully helped by my enlightenment.
The code
import threading
import time
def calc():
time.sleep(5)
while True:
a = 2356^36
n = 0
while True:
try:
n += 1
t = threading.Thread(target=calc)
t.start()
except RuntimeError:
print("max threads: {n}".format(n=n))
break
else:
print('.')
time.sleep(100000)
Led to 889 threads being started.

The load on the CPUs was however distributed (and surprisingly low for a pure CPU calculation, the laptop is otherwise idle with an empty load when not running my script):

Why is it so? Are the threads constantly moved as a pack between CPUs and what I see is just an average (the reality being that at a given moment all threads are on one CPU)? Or are they indeed distributed?
As of today it is still the case that 'one thread holds the GIL'. So one thread is running at a time.
The threads are managed on the operating system level. What happens is that every 100 'ticks' (=interpreter instruction) the running thread releases the GIL and resets the tick counter.
Because the threads in this example do continuous calculations, the tick limit of 100 instructions is reached very fast, leading to an almost immediate release of the GIL and a 'battle' between threads starts to acquire the GIL.
So, my assumption is that your operating system has a higher than expected load , because of (too) fast thread switching + almost continuous releasing and acquiring the GIL. The OS spends more time on switching than actually doing any useful calculation.
As you mention yourself, for using more than one core at a time, it's better to look at multiprocessing modules (joblib/Parallel).
Interesting read: http://www.dabeaz.com/python/UnderstandingGIL.pdf
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With