Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: multiprocessing, 8/24 cores loaded

I have a machine with 24 physical cores (at least I was told so) running Debian: Linux 3.2.0-4-amd64 #1 SMP Debian 3.2.68-1+deb7u1 x86_64 GNU/Linux. It seems to be correct:

usr@machine:~/$ cat /proc/cpuinfo  | grep processor
processor   : 0
processor   : 1
<...>
processor   : 22
processor   : 23

I had some issues trying to load all cores with Python's multiprocessing.pool.Pool. I used Pool(processes=None); the docs say that Python uses cpu_count() if None is provided.

Alas, only 8 cores were 100% loaded, others remained idle (I used htop to monitor CPU load). I thought that I cannot cook Pools properly and tried to invoke 24 processes "manually":

print 'Starting processes...'
procs = list()
for param_set in all_params:  # 24 items
    p = Process(target=_wrap_test, args=[param_set])
    p.start()
    procs.append(p)

print 'Now waiting for them.'
for p in procs:
    p.join()

I had 24 "greeting" messages from the processes I started:

Starting processes...
Executing combination: Session len: 15, delta: 10, ratio: 0.1, eps_relabel: 0.5, min_pts_lof: 5, alpha: 0.01, reduce: 500
< ... 22 more messages ... >
Executing combination: Session len: 15, delta: 10, ratio: 0.1, eps_relabel: 0.5, min_pts_lof: 7, alpha: 0.01, reduce: 2000
Now waiting for them.

But still only 8 cores were loaded:

enter image description here

I've read here on SO that there may be issues with numpy, OpenBLAS and multicore execution. This is how I start my code:

OPENBLAS_MAIN_FREE=1 python -m tests.my_module

And after all imports I do:

os.system("taskset -p 0xff %d" % os.getpid())

So, here is the question: what should I do to have 100%-load on all cores? Is this just my poor Python usage or it has something to do with OS limitations on multicore machines?

UPDATED: one more interesting thing is some inconsistency within htop output. If you look at the image above, you'll see that the table below the CPU load bars shows 30-50% load for much more than 8 cores, which is definitely different from what load bars say. Then, top seems to agree with those bars: 8 cores 100%-loaded, others idle.

UPDATED ONCE AGAIN:

I used this rather popular post on SO when I added the os.system("taskset -p 0xff %d" % os.getpid()) line after all imports. I have to admit that I didn't think too much when I did that, especially after reading this:

With this line pasted in after the module imports, my example now runs on all cores

I'm a simple man. I see "works like a charm", I copy and paste. Anyway, while playing with my code I eventually removed this line. After that my code began executing on all 24 cores for the "manual" Process starting scenario. For the Pool scenario the same problem remained, no matter whether the affinity trick was used or not.

I don't think it's a real answer 'cause I don't know what the issue is with Pool, but at least I managed to get all cores fully loaded. Thank you!

like image 802
oopcode Avatar asked Jul 09 '15 14:07

oopcode


People also ask

Does multiprocessing in Python use multiple cores?

Python processes typically use a single thread because of the GIL. Despite the GIL, libraries that perform computationally heavy tasks like numpy, scipy and pytorch utilise C-based implementations under the hood, allowing the use of multiple cores.

How many processes should be running Python multiprocessing?

If we are using the context manager to create the process pool so that it is automatically shutdown, then you can configure the number of processes in the same manner. The number of workers must be less than or equal to 61 if Windows is your operating system.

How do I count CPU cores in Python?

We can get the count of the number of CPUs in your system using the multiprocessing. cpu_count() function. This function will return the number of logical CPUs in your system as an integer. If the number of CPUs cannot be determined, then the function will return the value None.

Does multiprocessing require multiple cores?

Meanwhile, you can get some of the benefits of multiprocessing without multiple cores. The main benefit—the reason the module was designed—is parallelism for speed. And obviously, without 4 cores, you aren't going to cut your time down to 25%.


2 Answers

In os.system("taskset -p 0xff %d" % os.getpid()), 0xff is essentially a hexadecimal bitmask, corresponding to 1111 1111. Each bit in the bitmask corresponds to a CPU core. The bit value 1 means that the process can be executed on the corresponding CPU core. Therefore, to run on 24 cores you should use a mask of 0xffffff instead of 0xff.

Correct command:

os.system("taskset -p 0xffffff %d" % os.getpid())
like image 87
yraghu Avatar answered Oct 04 '22 17:10

yraghu


Even though you solved the issue I'll try to explain it to clarify the ideas.

For what I read around, numpy does a lot of "magic" to improve performance. One of the magic tricks is to set the CPU affinity of the process.

The CPU affinity is an optimisation of the OS scheduler. It basically enforces a given process to be always run on the same CPU core.

This improves performance reducing the amount of times the CPU cache is invalidated and increasing the benefits from reference locality. On high computational tasks these factors are indeed important.

What I don't like of numpy is the fact that it does all this implicitly. Often puzzling developers.

The fact that your processes where not running on all the cores was due to the fact that numpy sets the affinity to the parent process when you import the module. Then, when you spawn the new processes the affinity is inherited leading to all the processes fighting for few cores instead of efficiently using all the available ones.

The os.system("taskset -p 0xff %d" % os.getpid()) command instruct the OS to set the affinity back on all the cores solving your issue.

If you want to see it working on the Pool you can do the following trick.

import os
from multiprocessing import Pool


def set_affinity_on_worker():
    """When a new worker process is created, the affinity is set to all CPUs"""
    print("I'm the process %d, setting affinity to all CPUs." % os.getpid())
    os.system("taskset -p 0xff %d" % os.getpid())


if __name__ == '__main__':
    p = Pool(initializer=set_affinity_on_worker)
    ...
like image 38
noxdafox Avatar answered Oct 04 '22 17:10

noxdafox