Python cost of locking vs. performance, (does multithreading make sense?)

Question

I'm working on a project where throughput of my code quite important and after some consideration I choose to make my program threaded.

The main thread and the subthread both adds and removes from two shared dictionaries. I've been looking through the interwebs about some input considering the performance of locking in python, is it a slow operation, etc.

So what I'm getting at since python actually isn't actually threaded at all (thinking of the GIL only working on one core) if I need high performance in my application do I have anything to win by making it threaded except for handling IO?

EDIT

The actual question is (after a insightful comment)

Does multithreading make sense in python, since there's GIL?

Jcyrss · Accepted Answer

IMO, lock solution impacts a lot on performance mostly when muliptle threads really waiting for it.

The cost of acquiring and releasing an uncontended lock should be trivial.

This thread shows a testing about that.

Ok, here is the cost of acquiring and releasing an uncontended lock under Linux, with Python 3.2:
$ python3 -m timeit \
  -s "from threading import Lock; l=Lock(); a=l.acquire; r=l.release" \
  "a(); r()"

10000000 loops, best of 3: 0.127 usec per loop
And here is the cost of calling a dummy Python function:
$ python3 -m timeit -s "def a(): pass" "a(); a()"

1000000 loops, best of 3: 0.221 usec per loop
And here is the cost of calling a trivial C function (which returns the False singleton):
$ python3 -m timeit -s "a=bool" "a(); a()"

10000000 loops, best of 3: 0.164 usec per loop
Also, note that using the lock as a context manager is actually slower, not faster as you might imagine:
$ python3 -m timeit -s "from threading import Lock; l=Lock()" \
  "with l: pass"

1000000 loops, best of 3: 0.242 usec per loop
At least under Linux, there doesn't seem to be a lot of room for improvement in lock performance, to say the least.

PS: RLock is now as fast as Lock:
$ python3 -m timeit \
  -s "from threading import RLock; l=RLock(); a=l.acquire; r=l.release" \
  "a(); r()"

10000000 loops, best of 3: 0.114 usec per loop

Martijn Pieters · Answer

First of all, locking in any language is a performance bottleneck. Minimize locking where possible; don't use shared directories for example, create a tree instead and have each thread work in a different branch of that tree.

Since you'll be doing a lot of I/O, your performance problems will lie there and threading is not necessarily going to improve matters. Look into event-driven architectures first:

The stdlib asyncore module
twisted
eventlets
greenlets

The GIL is not likely to be your problem here; it'll be released whenever a thread enters C code, for example (almost certainly during any I/O call). If it ever does become a bottleneck, move to multiple processes. On a large intranet cluster I administer, for example, we run 6 processes of each 2 threads to make full use of all the CPU cores (2 of the processes carry a very light load).

If you feel you need multiple processes, either use the multiprocessing module or make it easy to start multiple instances of your server (each listening on a different port) and use a load balancer such as haproxy to direct traffic to each server.

Python cost of locking vs. performance, (does multithreading make sense?)

Tags:

performance

python

multithreading

gil

Daniel Figueroa

2 Answers

Jcyrss

Martijn Pieters

Recent Activity

Donate For Us

Python cost of locking vs. performance, (does multithreading make sense?)

Tags:

performance

python

multithreading

gil

Daniel Figueroa

2 Answers

Jcyrss

Martijn Pieters

Related questions

Recent Activity

Donate For Us