Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python multiprocessing vs threading for cpu bound work on windows and linux

So I knocked up some test code to see how the multiprocessing module would scale on cpu bound work compared to threading. On linux I get the performance increase that I'd expect:

linux (dual quad core xeon): serialrun took 1192.319 ms parallelrun took 346.727 ms threadedrun took 2108.172 ms 

My dual core macbook pro shows the same behavior:

osx (dual core macbook pro) serialrun took 2026.995 ms parallelrun took 1288.723 ms threadedrun took 5314.822 ms 

I then went and tried it on a windows machine and got some very different results.

windows (i7 920): serialrun took 1043.000 ms parallelrun took 3237.000 ms threadedrun took 2343.000 ms

Why oh why, is the multiprocessing approach so much slower on windows?

Here's the test code:

#!/usr/bin/env python  import multiprocessing import threading import time  def print_timing(func):     def wrapper(*arg):         t1 = time.time()         res = func(*arg)         t2 = time.time()         print '%s took %0.3f ms' % (func.func_name, (t2-t1)*1000.0)         return res     return wrapper   def counter():     for i in xrange(1000000):         pass  @print_timing def serialrun(x):     for i in xrange(x):         counter()  @print_timing def parallelrun(x):     proclist = []     for i in xrange(x):         p = multiprocessing.Process(target=counter)         proclist.append(p)         p.start()      for i in proclist:         i.join()  @print_timing def threadedrun(x):     threadlist = []     for i in xrange(x):         t = threading.Thread(target=counter)         threadlist.append(t)         t.start()      for i in threadlist:         i.join()  def main():     serialrun(50)     parallelrun(50)     threadedrun(50)  if __name__ == '__main__':     main()
like image 903
manghole Avatar asked Aug 17 '09 19:08

manghole


People also ask

Which is better multiprocessing or multithreading in Python?

Multiprocessing is a easier to just drop in than threading but has a higher memory overhead. If your code is CPU bound, multiprocessing is most likely going to be the better choice—especially if the target machine has multiple cores or CPUs.

Does Python multiprocessing work on Windows?

The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads. Due to this, the multiprocessing module allows the programmer to fully leverage multiple processors on a given machine. It runs on both Unix and Windows.

Is multiprocessing faster than multithreading in Python?

Another use case for threading is programs that are IO bound or network bound, such as web-scrapers. 2-Use Cases for Multiprocessing: Multiprocessing outshines threading in cases where the program is CPU intensive and doesn't have to do any IO or user interaction.

When would you want to use multiprocessing vs threading?

Multiprocessing uses two or more CPUs to increase computing power, whereas multithreading uses a single process with multiple code segments to increase computing power. Multithreading focuses on generating computing threads from a single process, whereas multiprocessing increases computing power by adding CPUs.


2 Answers

The python documentation for multiprocessing blames the lack of os.fork() for the problems in Windows. It may be applicable here.

See what happens when you import psyco. First, easy_install it:

C:\Users\hughdbrown>\Python26\scripts\easy_install.exe psyco Searching for psyco Best match: psyco 1.6 Adding psyco 1.6 to easy-install.pth file  Using c:\python26\lib\site-packages Processing dependencies for psyco Finished processing dependencies for psyco 

Add this to the top of your python script:

import psyco psyco.full() 

I get these results without:

serialrun took 1191.000 ms parallelrun took 3738.000 ms threadedrun took 2728.000 ms 

I get these results with:

serialrun took 43.000 ms parallelrun took 3650.000 ms threadedrun took 265.000 ms 

Parallel is still slow, but the others burn rubber.

Edit: also, try it with the multiprocessing pool. (This is my first time trying this and it is so fast, I figure I must be missing something.)

@print_timing def parallelpoolrun(reps):     pool = multiprocessing.Pool(processes=4)     result = pool.apply_async(counter, (reps,)) 

Results:

C:\Users\hughdbrown\Documents\python\StackOverflow>python  1289813.py serialrun took 57.000 ms parallelrun took 3716.000 ms parallelpoolrun took 128.000 ms threadedrun took 58.000 ms 
like image 53
hughdbrown Avatar answered Sep 22 '22 22:09

hughdbrown


Processes are much more lightweight under UNIX variants. Windows processes are heavy and take much more time to start up. Threads are the recommended way of doing multiprocessing on windows.

like image 42
Byron Whitlock Avatar answered Sep 22 '22 22:09

Byron Whitlock