Python multiprocessing: restrict number of cores used

Tags:

I want to know how to distribute N independent tasks to exactly M processors on a machine that has L cores, where L>M. I don't want to use all the processors because I still want to have I/O available. The solutions I've tried seem to get distributed to all processors, bogging down the system.

I assume the multiprocessing module is the way to go.

I do numerical simulations. My background is in physics, not computer science, so unfortunately, I often don't fully understand discussions involving standard tasking models like server/client, producer/consumer, etc.

Here are some simplified models that I've tried:

Suppose I have a function run_sim(**kwargs) (see that further below) that runs a simulation, and a long list of kwargs for the simulations, and I have an 8 core machine.

from multiprocessing import Pool, Process

#using pool
p = Pool(4)
p.map(run_sim, kwargs)

# using process
number_of_live_jobs=0
all_jobs=[]
sim_index=0
while sim_index < len(kwargs)+1:
   number_of_live_jobs = len([1 for job in all_jobs if job.is_alive()])
   if number_of_live_jobs <= 4:
      p = Process(target=run_sim, args=[], kwargs=kwargs[sim_index])
      print "starting job", kwargs[sim_index]["data_file_name"]
      print "number of live jobs: ", number_of_live_jobs
      p.start()
      p.join()
      all_jobs.append(p)
      sim_index += 1

When I look at the processor usage with "top" and then "1", All processors seem to get used anyway in either case. It is not out of the question that I am misinterpreting the output of "top", but if the run_simulation() is processor intensive, the machine bogs down heavily.

Hypothetical simulation and data:

# simulation kwargs
numbers_of_steps = range(0,10000000, 1000000)
sigmas = [x for x in range(11)]
kwargs = []
for number_of_steps in numbers_of_steps:
   for sigma in sigmas:
      kwargs.append(
         dict(
            number_of_steps=number_of_steps,
            sigma=sigma,
            # why do I need to cast to int?
            data_file_name="walk_steps=%i_sigma=%i" % (number_of_steps, sigma),
            )
         )

import random, time
random.seed(time.time())

# simulation of random walk
def run_sim(kwargs):
   number_of_steps = kwargs["number_of_steps"]
   sigma = kwargs["sigma"]
   data_file_name = kwargs["data_file_name"]
   data_file = open(data_file_name+".dat", "w")
   current_position = 0
   print "running simulation", data_file_name
   for n in range(int(number_of_steps)+1):
      data_file.write("step number %i   position=%f\n" % (n, current_position))
      random_step = random.gauss(0,sigma)
      current_position += random_step

   data_file.close()

345

asked Oct 15 '09 21:10

abalter

2 Answers

If you are on linux, use taskset when you launch the program

A child created via fork(2) inherits its parent’s CPU affinity mask. The affinity mask is preserved across an execve(2).

TASKSET(1)
Linux User’s Manual
TASKSET(1)

NAME taskset - retrieve or set a process’s CPU affinity

SYNOPSIS taskset [options] mask command [arg]... taskset [options] -p [mask] pid

DESCRIPTION taskset is used to set or retrieve the CPU affinity of a running process given its PID or to launch a new COMMAND with a given CPU affinity. CPU affinity is a scheduler property that "bonds" a process to a given set of CPUs on the system. The Linux scheduler will honor the given CPU affinity and the process will not run on any other CPUs. Note that the Linux scheduler also supports natural CPU affinity: the scheduler attempts to keep processes on the same CPU as long as practical for performance reasons. Therefore, forcing a specific CPU affinity is useful only in certain applications.

The CPU affinity is represented as a bitmask, with the lowest order bit corresponding to the first logical CPU and the highest order bit corresponding to the last logical CPU. Not all CPUs may exist on a given sys‐ tem but a mask may specify more CPUs than are present. A retrieved mask will reflect only the bits that cor‐ respond to CPUs physically on the system. If an invalid mask is given (i.e., one that corresponds to no valid CPUs on the current system) an error is returned. The masks are typically given in hexadecimal.

answered Sep 21 '22 14:09

John La Rooy

You might want to look into the following package:

http://pypi.python.org/pypi/affinity

It is a package that uses sched_setaffinity and sched _getaffinity.

The drawback is that it is highly Linux-specific.

answered Sep 21 '22 14:09

terminus

Related questions
                            
                                airflow dag failed... but all tasks succeeded
                            
                                Pandas dataframe type datetime64[ns] is not working in Hive/Athena
                            
                                OpenCV - How to get real world distance from a 2D image using a chessboard as reference
                            
                                How to make a progress bar on a web page for pandas operation
                            
                                Python Asyncio Task Cancellation
                            
                                How to subset row of condition with some of N rows before the condition meet , more faster than my code?
                            
                                How to test the pytest fixture itself?
                            
                                Imports break VSCode testing with pytest
                            
                                Why in Keras subclassing API, the call method is never called and as an alternative the input is passed by calling the object of this class?
                            
                                vscode "no refactorings available" for python
                            
                                How to run tensorflow inference for multiple models on GPU in parallel?
                            
                                Pycharm Can't retrieve image ID from build stream
                            
                                Google Cloud Functions Deploy "allow unauthenticated invocations..."
                            
                                pydantic: Using property.getter decorator for a field with an alias
                            
                                How to do persistent database connection in FastAPI?
                            
                                How to type-hint a matplotlib.axes._subplots.AxesSubplots object in python3
                            
                                Groupby based on a multiple logical conditions applied to a different columns DataFrame
                            
                                Python multiprocessing within Flask request with Gunicorn + Nginx
                            
                                Delete directory and all symlinks recursively
                            
                                Do any Python ORMs (SQLAlchemy?) work with Google App Engine?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python multiprocessing: restrict number of cores used

Tags:

python

multiprocessing

abalter

People also ask

2 Answers

John La Rooy

terminus

Recent Activity

Donate For Us