Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Using Multiprocessing

I am trying to use multiprocessing in python 3.6. I have a for loopthat runs a method with different arguments. Currently, it is running one at a time which is taking quite a bit of time so I am trying to use multiprocessing. Here is what I have:

def test(self):
    for key, value in dict.items():
        pool = Pool(processes=(cpu_count() - 1))
        pool.apply_async(self.thread_process, args=(key,value))
        pool.close()
        pool.join()


def thread_process(self, key, value):
    # self.__init__()
    print("For", key)

I think what my code is using 3 processes to run one method but I would like to run 1 method per process but I don't know how this is done. I am using 4 cores btw.

like image 535
anderish Avatar asked Jun 20 '17 18:06

anderish


People also ask

Can we use multiprocessing in Python?

This is what gives multiprocessing an upper hand over threading in Python. Multiple processes can be run in parallel because each process has its own interpreter that executes the instructions allocated to it.

Why multiprocessing is used in Python?

The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads. Due to this, the multiprocessing module allows the programmer to fully leverage multiple processors on a given machine. It runs on both Unix and Windows.

Should I use threading or multiprocessing?

Multiprocessing is a easier to just drop in than threading but has a higher memory overhead. If your code is CPU bound, multiprocessing is most likely going to be the better choice—especially if the target machine has multiple cores or CPUs.


2 Answers

You're making a pool at every iteration of the for loop. Make a pool beforehand, apply the processes you'd like to run in multiprocessing, and then join them:

from multiprocessing import Pool, cpu_count
import time

def t():
    # Make a dummy dictionary
    d = {k: k**2 for k in range(10)}

    pool = Pool(processes=(cpu_count() - 1))

    for key, value in d.items():
        pool.apply_async(thread_process, args=(key, value))

    pool.close()
    pool.join()


def thread_process(key, value):
    time.sleep(0.1)  # Simulate a process taking some time to complete
    print("For", key, value)

if __name__ == '__main__':
    t()
like image 98
Pedro von Hertwig Batista Avatar answered Sep 23 '22 16:09

Pedro von Hertwig Batista


You're not populating your multiprocessing.Pool with data - you're re-initializing the pool on each loop. In your case you can use Pool.map() to do all the heavy work for you:

def thread_process(args):
    print(args)

def test():
    pool = Pool(processes=(cpu_count() - 1))
    pool.map(thread_process, your_dict.items())
    pool.close()

if __name__ == "__main__":  # important guard for cross-platform use
    test()

Also, given all those self arguments I reckon you're snatching this off of a class instance and if so - don't, unless you know what you're doing. Since multiprocessing in Python essentially works as, well, multi-processing (unlike multi-threading) you don't get to share your memory, which means your data is pickled when exchanging between processes, which means anything that cannot be pickled (like instance methods) doesn't get called. You can read more on that problem on this answer.

like image 41
zwer Avatar answered Sep 24 '22 16:09

zwer