Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Multithreading vs Multiprocessing vs Sequential Execution

I have the below code:

import time
from threading import Thread
from multiprocessing import Process 

def fun1():

 for _ in xrange(10000000):
        print 'in fun1'
        pass

def fun2():

 for _ in xrange(10000000):
        print 'in fun2'
        pass

def fun3():

 for _ in xrange(10000000):
        print 'in fun3'
        pass

def fun4():

 for _ in xrange(10000000):
        print 'in fun4'
        pass

if __name__ == '__main__':

  #t1 = Thread(target=fun1, args=())
  t1 = Process(target=fun1, args=())
  #t2 = Thread(target=fun2, args=())
  t2 = Process(target=fun2, args=())
  #t3 = Thread(target=fun3, args=())
  t3 = Process(target=fun3, args=())
  #t4 = Thread(target=fun4, args=())
  t4 = Process(target=fun4, args=())
  t1.start()
  t2.start() 
  t3.start() 
  t4.start()
  start = time.clock()
  t1.join()
  t2.join()
  t3.join()
  t4.join()
  end = time.clock()
  print("Time Taken = ",end-start)

  '''
  start = time.clock()
  fun1()
  fun2()
  fun3()
  fun4()
  end = time.clock()
  print("Time Taken = ",end-start)
  '''

I ran the above program in three ways:

  • First Sequential Execution ALONE(look at the commented code and comment the upper code)
  • Second Multithreaded Execution ALONE
  • Third Multiprocessing Execution ALONE

The observations for end_time-start time are as follows:

Overall Running times

  • ('Time Taken = ', 342.5981313667716) --- Running time by threaded execution
  • ('Time Taken = ', 232.94691744899296) --- Running time by sequential Execution
  • ('Time Taken = ', 307.91093406618216) --- Running time by Multiprocessing execution

Question :

I see sequential execution takes least time and Multithreading takes highest time. Why? I am unable to understand and also surprised by results.Please clarify.

Since this is a CPU intensive task and GIL is acquired, my understanding was Multiprocessing would take least time while threaded execution would take highest time.Please validate my understanding.

like image 482
fsociety Avatar asked Aug 26 '16 06:08

fsociety


People also ask

Should I use multithreading or multiprocessing in Python?

Multiprocessing is a easier to just drop in than threading but has a higher memory overhead. If your code is CPU bound, multiprocessing is most likely going to be the better choice—especially if the target machine has multiple cores or CPUs.

Is multiprocessing faster than multithreading in Python?

2-Use Cases for Multiprocessing: Multiprocessing outshines threading in cases where the program is CPU intensive and doesn't have to do any IO or user interaction.

Should I use multithreading or multiprocessing?

But the creation of processes itself is a CPU heavy task and requires more time than the creation of threads. Also, processes require more resources than threads. Hence, it is always better to have multiprocessing as the second option for IO-bound tasks, with multithreading being the first.

Why is multithreading bad in Python?

Python threading allows you to have different parts of your program run concurrently and can simplify your design. If you've got some experience in Python and want to speed up your program using threads, then this tutorial is for you!


1 Answers

You use time.clock, wich gave you CPU time and not real time : you can't use that in your case, as it gives you the execution time (how long did you use the CPU to run your code, wich will be almost the same time for each of these case)

Running your code with time.time() instead of time.clock gave me these time on my computer :

Process : ('Time Taken = ', 5.226783990859985)
seq : ('Time Taken = ', 6.3122560000000005)
Thread :  ('Time Taken = ', 17.10062599182129)

The task given here (printing) is so fast that the speedup from using multiprocessing is almost balanced by the overhead.

For Threading, as you can only have one Thread running because of the GIL, you end up running all your functions sequentially BUT you had the overhead of threading (changing threads every few iterations can cost up to several milliseconds each time). So you end up with something much slower.

Threading is usefull if you have waiting times, so you can run tasks in between.

Multiprocessing is usefull for computationnally expensive tasks, if possible completely independant (no shared variables). If you need to share variables, then you have to face the GIL and it's a little bit more complicated (but not impossible most of the time).

EDIT : Actually, using time.clock like you did gave you the information about how much overhead using Threading and Multiprocessing cost you.

like image 63
CoMartel Avatar answered Sep 23 '22 02:09

CoMartel