Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Terminate long running python threads

What is the recommended way to terminate unexpectedly long running threads in python ? I can't use SIGALRM, since

Some care must be taken if both signals and threads are used in the same program. The fundamental thing to remember in using signals and threads simultaneously is: always perform signal() operations in the main thread of execution. Any thread can perform an alarm(), getsignal(), pause(), setitimer() or getitimer(); only the main thread can set a new signal handler, and the main thread will be the only one to receive signals (this is enforced by the Python signal module, even if the underlying thread implementation supports sending signals to individual threads). This means that signals can’t be used as a means of inter-thread communication.Use locks instead.

Update: each thread in my case blocks -- it is downloading a web page using urllib2 module and sometimes operation takes too many time on an extremely slow sites. That's why I want to terminate such slow threads

like image 293
Konstantin Avatar asked Aug 04 '09 07:08

Konstantin


2 Answers

Since abruptly killing a thread that's in a blocking call is not feasible, a better approach, when possible, is to avoid using threads in favor of other multi-tasking mechanisms that don't suffer from such issues.

For the OP's specific case (the threads' job is to download web pages, and some threads block forever due to misbehaving sites), the ideal solution is twisted -- as it generally is for networking tasks. In other cases, multiprocessing might be better.

More generally, when threads give unsolvable issues, I recommend switching to other multitasking mechanisms rather than trying heroic measures in the attempt to make threads perform tasks for which, at least in CPython, they're unsuitable.

like image 135
Alex Martelli Avatar answered Oct 25 '22 08:10

Alex Martelli


As Alex Martelli suggested, you could use the multiprocessing module. It is very similar to the Threading module so that should get you off to a start easily. Your code could be like this for example:

import multiprocessing

def get_page(*args, **kwargs):
    # your web page downloading code goes here

def start_get_page(timeout, *args, **kwargs):
    p = multiprocessing.Process(target=get_page, args=args, kwargs=kwargs)
    p.start()
    p.join(timeout)
    if p.is_alive():
        # stop the downloading 'thread'
        p.terminate()
        # and then do any post-error processing here

if __name__ == "__main__":
    start_get_page(timeout, *args, **kwargs)

Of course you need to somehow get the return values of your page downloading code. For that you could use multiprocessing.Pipe or multiprocessing.Queue (or other ways available with multiprocessing). There's more information, as well as samples you could check here.

Lastly, the multiprocessing module is included in python 2.6. It is also available for python 2.5 and 2.4 at pypi (you can use easy_install multiprocessing) or just visit pypi and download and install the packages manually.

Note: I realize this has been posted awhile ago. I was having a similar problem to this and stumbled here and saw Alex Martelli's suggestion. Had it implemented for my problem and decided to share it. (I'd like to thank Alex for pointing me in the right direction.)

like image 29
Vin-G Avatar answered Oct 25 '22 09:10

Vin-G