Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

multiprocessing.Pool() slower than just using ordinary functions

(This question is about how to make multiprocessing.Pool() run code faster. I finally solved it, and the final solution can be found at the bottom of the post.)

Original Question:

I'm trying to use Python to compare a word with many other words in a list and retrieve a list of the most similar ones. To do that I am using the difflib.get_close_matches function. I'm on a relatively new and powerful Windows 7 Laptop computer, with Python 2.6.5.

What I want is to speed up the comparison process because my comparison list of words is very long and I have to repeat the comparison process several times. When I heard about the multiprocessing module it seemed logical that if the comparison could be broken up into worker tasks and run simultaneously (and thus making use of machine power in exchange for faster speed) my comparison task would finish faster.

However, even after having tried many different ways, and used methods that have been shown in the docs and suggested in forum posts, the Pool method just seems to be incredibly slow, much slower than just running the original get_close_matches function on the entire list at once. I would like help understanding why Pool() is being so slow and if I am using it correctly. Im only using this string comparison scenario as an example because that is the most recent example I could think of where I was unable to understand or get multiprocessing to work for rather than against me. Below is just an example code from the difflib scenario showing the time differences between the ordinary and the Pooled methods:

from multiprocessing import Pool import random, time, difflib  # constants wordlist = ["".join([random.choice([letter for letter in "abcdefghijklmnopqersty"]) for lengthofword in xrange(5)]) for nrofwords in xrange(1000000)] mainword = "hello"  # comparison function def findclosematch(subwordlist):     matches = difflib.get_close_matches(mainword,subwordlist,len(subwordlist),0.7)     if matches <> []:         return matches  # pool print "pool method" if __name__ == '__main__':     pool = Pool(processes=3)     t=time.time()     result = pool.map_async(findclosematch, wordlist, chunksize=100)     #do something with result     for r in result.get():         pass     print time.time()-t  # normal print "normal method" t=time.time() # run function result = findclosematch(wordlist) # do something with results for r in result:     pass print time.time()-t 

The word to be found is "hello", and the list of words in which to find close matches is a 1 million long list of 5 randomly joined characters (only for illustration purposes). I use 3 processor cores and the map function with a chunksize of 100 (listitems to be procesed per worker I think??) (I also tried chunksizes of 1000 and 10 000 but there was no real difference). Notice that in both methods I start the timer right before calling on my function and end it right after having looped through the results. As you can see below the timing results are clearly in favor of the original non-Pool method:

>>>  pool method 37.1690001488 seconds normal method 10.5329999924 seconds >>>  

The Pool method is almost 4 times slower than the original method. Is there something I am missing here, or maybe misunderstanding about how the Pooling/multiprocessing works? I do suspect that part of the problem here could be that the map function returns None and so adds thousands of unneccessary items to the resultslist even though I only want actual matches to be returned to the results and have written it as such in the function. From what I understand that is just how map works. I have heard about some other functions like filter that only collects non-False results, but I dont think that multiprocessing/Pool supports the filter method. Are there any other functions besides map/imap in the multiprocessing module that could help me out in only returning what my function returns? Apply function is more for giving multiple arguments as I understand it.

I know there's also the imap function, which I tried but without any time-improvements. The reason being the same reason why I have had problems understanding what's so great about the itertools module, supposedly "lightning fast", which I've noticed is true for calling the function, but in my experience and from what I've read that's because calling the function doesn't actually do any calculations, so when it's time to iterate through the results to collect and analyze them (without which there would be no point in calling the cuntion) it takes just as much or sometimes more time than a just using the normal version of the function straightup. But I suppose that's for another post.

Anyway, excited to see if someone can nudge me in the right direction here, and really appreciate any help on this. I'm more interested in understanding multiprocessing in general than to get this example to work, though it would be useful with some example solution code suggestions to aid in my understanding.

The Answer:

Seems like the slowdown had to do with the slow startup time of additional processes. I couldnt get the .Pool() function to be fast enough. My final solution to make it faster was to manually split the workload list, use multiple .Process() instead of .Pool(), and return the solutions in a Queue. But I wonder if maybe the most crucial change might have been splitting the workload in terms of the main word to look for rather than the words to compare with, perhaps because the difflib search function is already so fast. Here is the new code running 5 processes at the same time, and turned out about x10 faster than running a simple code (6 seconds vs 55 seconds). Very useful for fast fuzzy lookups, on top of how fast difflib already is.

from multiprocessing import Process, Queue import difflib, random, time  def f2(wordlist, mainwordlist, q):     for mainword in mainwordlist:         matches = difflib.get_close_matches(mainword,wordlist,len(wordlist),0.7)         q.put(matches)  if __name__ == '__main__':      # constants (for 50 input words, find closest match in list of 100 000 comparison words)     q = Queue()     wordlist = ["".join([random.choice([letter for letter in "abcdefghijklmnopqersty"]) for lengthofword in xrange(5)]) for nrofwords in xrange(100000)]     mainword = "hello"     mainwordlist = [mainword for each in xrange(50)]      # normal approach     t = time.time()     for mainword in mainwordlist:         matches = difflib.get_close_matches(mainword,wordlist,len(wordlist),0.7)         q.put(matches)     print time.time()-t      # split work into 5 or 10 processes     processes = 5     def splitlist(inlist, chunksize):         return [inlist[x:x+chunksize] for x in xrange(0, len(inlist), chunksize)]     print len(mainwordlist)/processes     mainwordlistsplitted = splitlist(mainwordlist, len(mainwordlist)/processes)     print "list ready"      t = time.time()     for submainwordlist in mainwordlistsplitted:         print "sub"         p = Process(target=f2, args=(wordlist,submainwordlist,q,))         p.Daemon = True         p.start()     for submainwordlist in mainwordlistsplitted:         p.join()     print time.time()-t     while True:         print q.get() 
like image 686
Karim Bahgat Avatar asked Dec 22 '13 07:12

Karim Bahgat


People also ask

Why multi process is slow?

The multiprocessing version is slower because it needs to reload the model in every map call because the mapped functions are assumed to be stateless. The multiprocessing version looks as follows. Note that in some cases, it is possible to achieve this using the initializer argument to multiprocessing.

When would you use a multiprocessing pool?

Python multiprocessing Pool can be used for parallel execution of a function across multiple input values, distributing the input data across processes (data parallelism).

Does multiprocessing speed up?

Multiprocessing enables the computer to utilize multiple cores of a CPU to run tasks/processes in parallel. This parallelization leads to significant speedup in tasks that involve a lot of computation.

Does multiprocessing make Python faster?

In multiprocessing , multiple Python processes are created and used to execute a function instead of multiple threads, bypassing the Global Interpreter Lock (GIL) that can significantly slow down threaded Python programs.


2 Answers

These problems usually boil down to the following:

The function you are trying to parallelize doesn't require enough CPU resources (i.e. CPU time) to rationalize parallelization!

Sure, when you parallelize with multiprocessing.Pool(8), you theoretically (but not practically) could get a 8x speed up.

However, keep in mind that this isn't free - you gain this parallelization at the expense of the following overhead:

  1. Creating a task for every chunk (of size chunksize) in your iter passed to Pool.map(f, iter)
  2. For each task
    1. Serialize the task, and the task's return value (think pickle.dumps())
    2. Deserialize the task, and the task's return value (think pickle.loads())
    3. Waste significant time waiting for Locks on shared memory Queues, while worker processes and parent processes get() and put() from/to these Queues.
  3. One-time cost of calls to os.fork() for each worker process, which is expensive.

In essence, when using Pool() you want:

  1. High CPU resource requirements
  2. Low data footprint passed to each function call
  3. Reasonably long iter to justify the one-time cost of (3) above.

For a more in-depth exploration, this post and linked talk walk-through how large data being passed to Pool.map() (and friends) gets you into trouble.

Raymond Hettinger also talks about proper use of Python's concurrency here.

like image 89
The Aelfinn Avatar answered Sep 30 '22 03:09

The Aelfinn


My best guess is inter-process communication (IPC) overhead. In the single-process instance, the single process has the word list. When delegating to various other processes, the main process needs to constantly shuttle sections of the list to other processes.

Thus, it follows that a better approach might be to spin off n processes, each of which is responsible for loading/generating 1/n segment of the list and checking if the word is in that part of the list.

I'm not sure how to do that with Python's multiprocessing library, though.

like image 20
Multimedia Mike Avatar answered Sep 30 '22 03:09

Multimedia Mike