Using Python's multiprocessing.pool.map to manipulate the same integer

Tags:

Problem

I'm using Python's multiprocessing module to execute functions asynchronously. What I want to do is be able to track the overall progress of my script as each process calls and executes def add_print. For instance, I would like the code below to add 1 to total and print out the value (1 2 3 ... 18 19 20) every time a process runs that function. My first attempt was to use a global variable but this didn't work. Since the function is being called asynchronously, each process reads total as 0 to start off, and adds 1 independently of other processes. So the output is 20 1's instead of incrementing values.

How could I go about referencing the same block of memory from my mapped function in a synchronous manner, even though the function is being run asynchronously? One idea I had was to somehow cache total in memory and then reference that exact block of memory when I add to total. Is this a possible and fundamentally sound approach in python?

Please let me know if you need anymore info or if I didn't explain something well enough.

Thanks!

Code

#!/usr/bin/python

## Import builtins
from multiprocessing import Pool 

total = 0

def add_print(num):
    global total
    total += 1
    print total


if __name__ == "__main__":
    nums = range(20)

    pool = Pool(processes=20)
    pool.map(add_print, nums)

564

asked Aug 03 '15 00:08

Austin A

1 Answers

You could use a shared Value:

import multiprocessing as mp

def add_print(num):
    """
    https://eli.thegreenplace.net/2012/01/04/shared-counter-with-pythons-multiprocessing
    """
    with lock:
        total.value += 1
    print(total.value)

def setup(t, l):
    global total, lock
    total = t
    lock = l

if __name__ == "__main__":
    total = mp.Value('i', 0)
    lock = mp.Lock()
    nums = range(20)
    pool = mp.Pool(initializer=setup, initargs=[total, lock])
    pool.map(add_print, nums)

The pool initializer calls setup once for each worker subprocess. setup makes total a global variable in the worker process, so total can be accessed inside add_print when the worker calls add_print.

Note, the number of processes should not exceed the number of CPUs your machine has. If you do, the excess subprocesses will wait around for a CPUs to become available. So don't use processes=20 unless you have 20 or more CPUs. If you don't supply a processes argument, multiprocessing will detect the number of CPUs available and spawn a pool with that many workers for you. The number of tasks (e.g. the length of nums) usually greatly exceeds the number of CPUs. That's fine; the tasks are queued and processed by one of the workers as a worker becomes available.

answered Sep 27 '22 22:09

unutbu

Related questions
                            
                                How can I evaluate a list of strings as a list of tuples in Python?
                            
                                Newick tree representation to scipy.cluster.hierarchy linkage matrix format
                            
                                saving a dataframe to JSON file on local drive in pyspark
                            
                                Set dynamic node shape in network with matplotlib
                            
                                Unsupported format character?
                            
                                Python : Reading Large Excel Worksheets using Openpyxl
                            
                                Using Selenium on Raspberry Pi with Chromium
                            
                                Swap R and B color channel values in a directory of images? Python
                            
                                How to find Local maxima in Kernel Density Estimation?
                            
                                MemoryError's message as str is empty in Python
                            
                                Make a functional field editable in Openerp?
                            
                                Converting some columns from pandas dataframe to list of lists
                            
                                AUC-base Features Importance using Random Forest
                            
                                Finding when a value in a pandas.Series crosses/reaches a threshold
                            
                                TypeError: int() argument must be a string or a number, not 'Model Instance'
                            
                                Different YAML array representations
                            
                                how to select columns from R dataframe in rpy2 in python?
                            
                                Python Enum for Boolean variable
                            
                                Can't turn off images in Selenium / Firefox
                            
                                How to set grequests timeout

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Using Python's multiprocessing.pool.map to manipulate the same integer

Tags:

python

asynchronous

multiprocessing

shared-memory

shared-state

Austin A

People also ask

1 Answers

unutbu

Recent Activity

Donate For Us