Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

I have a serial Python application that takes hours to process, how can I decrease the time it takes to run?

Tags:

python

Could someone please post a few examples of multi-threaded python? I am searching all over the internet but cannot find a simple, easy-to-replicate tutorial. Simple examples are fine.

I have written a program which takes a few hours to run serially--I am hoping I can bring it's run time down to minutes after multi-threading it.

like image 955
Nick Avatar asked Dec 29 '22 12:12

Nick


1 Answers

I see you got a lot of examples, all so far from @Noctis, but I'm not sure how they're going to help you. Addressing your question more directly: the only way multithreading can speed your application up, in today's CPython, is if your slow-down is due in good part to "blocking I/O" operations, e.g. due to interactions with (for example) DB servers, mail servers, websites, and so on. (A powerful alternative to speed up I/O is asynchronous, AKA event-driven, programming, for which the richest Python framework is twisted -- but it can be harder to learn, if you've never done event-driven coding).

Even if you have many cores in your machine, one multi-threaded Python process will use only one of them at a time, except when it's executing specially coded extensions (typically in C, C++, Cython, and the like) which "release the GIL" (the global interpreter lock) when feasible.

If you do have many cores, multiprocessing (a module whose interface is designed to look a lot like threading) can indeed speed up your program. There are many other packages supporting "symmetric multi-processor" distributed programming, see the list here, but, out of all of them, multiprocessing is the one that comes as part of the standard library (a very convenient thing). If you have multiple computers with a fast LAN between them, you should also consider the more general approach of distributed processing, which could let you use all of your available computers for the same task (some of these packages are also listed at the previous URL I gave, under the "cluster computing" header).

What speed-up you can get for any number of available cores or computers ultimately depends on the nature of your problems -- and, if the problems per se are suitable for it, then also of the algorithms and data structures you're using... not all will speed-up well (it varies between "embarassingly parallel" problems such as ray-tracing, which speed up linearly all the way, to "intrinsically serial" ones where 100 machines won't be any faster than one). So, it's hard to advise you further without understanding the nature of your problems; care to explain that?

like image 75
Alex Martelli Avatar answered Apr 06 '23 14:04

Alex Martelli