Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What threading module should I use to prevent disk IO from blocking network IO?

I have a Python application that, to be brief, receives data from a remote server, processes it, responds to the server, and occasionally saves the processed data to disk. The problem I've encountered is that there is a lot of data to write, and the save process can take upwards of half a minute. This is apparently a blocking operation, so the network IO is stalled during this time. I'd like to be able to make the save operation take place in the background, so-to-speak, so that the application can continue to communicate with the server reasonably quickly.

I know that I probably need some kind of threading module to accomplish this, but I can't tell what the differences are between thread, threading, multiprocessing, and the various other options. Does anybody know what I'm looking for?

like image 680
ashastral Avatar asked Oct 16 '10 20:10

ashastral


1 Answers

Since you're I/O bound, then use the threading module.

You should almost never need to use thread, it's a low-level interface; the threading module is a high-level interface wrapper for thread.

The multiprocessing module is different from the threading module, multiprocessing uses multiple subprocesses to execute a task; multiprocessing just happens to use the same interface as threading to reduce learning curve. multiprocessing is typically used when you have CPU bound calculation, and need to avoid the GIL (Global Interpreter Lock) in a multicore CPU.

A somewhat more esoteric alternative to multi-threading is asynchronous I/O using asyncore module. Another options includes Stackless Python and Twisted.

like image 157
Lie Ryan Avatar answered Oct 13 '22 11:10

Lie Ryan