Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to maximize performance in Python when doing many I/O bound operations?

Tags:

python

I have a situation where I'm downloading a lot of files. Right now everything runs on one main Python thread, and downloads as many as 3000 files every few minutes. The problem is that the time it takes to do this is too long. I realize Python has no true multi-threading, but is there a better way of doing this? I was thinking of launching multiple threads since the I/O bound operations should not require access to the global interpreter lock, but perhaps I misunderstand that concept.

like image 568
Rhubarb Avatar asked Jun 04 '10 03:06

Rhubarb


1 Answers

Multithreading is just fine for the specific purpose of speeding up I/O on the net (although asynchronous programming would give even greater performance). CPython's multithreading is quite "true" (native OS threads) -- what you're probably thinking of is the GIL, the global interpreter lock that stops different threads from simultaneously running Python code. But all the I/O primitives give up the GIL while they're waiting for system calls to complete, so the GIL is not relevant to I/O performance!

For asynchronous programming, the most powerful framework around is twisted, but it can take a while to get the hang of it if you're never done such programming. It would probably be simpler for you to get extra I/O performance via the use of a pool of threads.

like image 137
Alex Martelli Avatar answered Nov 15 '22 21:11

Alex Martelli