Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python multi-threading slower than serial?

I'm trying to figure out multi-threading programming in python. Here's the simple task with which I want to compare serial and parallel speeds.

import threading
import Queue
import time
import math

def sinFunc(offset, n):
  result = []
  for i in range(n):
    result.append(math.sin(offset + i * i))
  return result

def timeSerial(k, n):
  t1 = time.time()    
  answers = []
  for i in range(k):
    answers.append(sinFunc(i, n))
  t2 = time.time()
  print "Serial time elapsed: %f" % (t2-t1)

class Worker(threading.Thread):

  def __init__(self, queue, name):
    self.__queue = queue
    threading.Thread.__init__(self)
    self.name = name

  def process(self, item):
    offset, n = item
    self.__queue.put(sinFunc(offset, n))
    self.__queue.task_done()
    self.__queue.task_done()

  def run(self):
    while 1:
        item = self.__queue.get()
        if item is None:
            self.__queue.task_done()
            break
        self.process(item)

def timeParallel(k, n, numThreads):
  t1 = time.time()    
  queue = Queue.Queue(0)
  for i in range(k):
    queue.put((i, n))
  for i in range(numThreads):
    queue.put(None)    
  for i in range(numThreads):
    Worker(queue, i).start()
  queue.join()
  t2 = time.time()
  print "Serial time elapsed: %f" % (t2-t1)

if __name__ == '__main__':

  n = 100000
  k = 100
  numThreads = 10

  timeSerial(k, n)
  timeParallel(k, n, numThreads)

#Serial time elapsed: 2.350883
#Serial time elapsed: 2.843030

Can someone explain to me what's going on? I'm used to C++, and a similar version of this using the module sees the speed-up we would expect.

like image 575
andyInCambridge Avatar asked May 28 '12 18:05

andyInCambridge


People also ask

Why is multithreading in Python slow?

This is due to the Python GIL being the bottleneck preventing threads from running completely concurrently. The best possible CPU utilisation can be achieved by making use of the ProcessPoolExecutor or Process modules which circumvents the GIL and make code run more concurrently.

Is multithreading faster than multiprocessing Python?

Both multithreading and multiprocessing allow Python code to run concurrently. Only multiprocessing will allow your code to be truly parallel. However, if your code is IO-heavy (like HTTP requests), then multithreading will still probably speed up your code.

Why Python is not good for multithreading?

Python doesn't support multi-threading because Python on the Cpython interpreter does not support true multi-core execution via multithreading. However, Python does have a threading library. The GIL does not prevent threading.

Why is my multithreading slower?

Every thread needs some overhead and system resources, so it also slows down performance. Another problem is the so called "thread explosion" when MORE thread are created than cores are on the system. And some waiting threads for the end of other threads is the worst idea for multi threading.


1 Answers

Other answers have referred to the issue of the GIL being the problem in cpython. But I felt there was a bit of missing information. This will cause you performance issues in situations where the code you are running in threads is CPU bound. In your case here, yes doing many calculations in threads is going to most likely result in dramatically degraded performance.

But, if you were doing something that was more IO bound, such as reading from many sockets in a network application, or calling out to subprocess, you can get performance increases from threads. A simple example for your code above would be to add a stupidly simple call out to the shell:

import os

def sinFunc(offset, n):
  result = []
  for i in xrange(n):
    result.append(math.sin(offset + i * i))
  os.system("echo 'could be a database query' >> /dev/null; sleep .1")
  return result

That call might have been something real like waiting on the filesystem. But you can see that in this example, threading will start to prove beneficial, as the GIL can be released when the thread is waiting on IO and other threads will continue to process. Even so, there is still a sweet spot for when more threads start to become negated by the overhead of creating them and synchronizing them.

For CPU bound code, you would make use of multiprocessing

From article: http://www.informit.com/articles/article.aspx?p=1850445&seqNum=9

...threading is more appropriate for I/O-bound applications (I/O releases the GIL, allowing for more concurrency)...

Similar question references about threads vs processes:
https://stackoverflow.com/a/1227204/496445
https://stackoverflow.com/a/990436/496445

like image 112
jdi Avatar answered Oct 06 '22 07:10

jdi