Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Self-repairing Python threads

I've created a web spider that accesses both a US and EU server. The US and EU servers are the same data structure, but have different data inside them, and I want to collate it all. In order to be nice to the server, there's a wait time between each request. As the program is exactly the same, in order to speed up processing, I've threaded the program so it can access the EU and US servers simultaneously.

This crawling will take on the order of weeks, not days. There will be exceptions, and while I've tried to handle everything inside the program, it's likely something weird might crop up. To be truly defensive about this, I'd like to catch a thread that's failed, log the error and restart it. Worst case I lose a handful of pages out of thousands, which is better than having a thread fail and lose 50% of speed. However, from what I've read, Python threads die silently. Does anyone have any ideas?

class AccessServer(threading.Thread):
    def __init__(self, site):
        threading.Thread.__init__(self)
        self.site = site
        self.qm = QueueManager.QueueManager(site)

    def run(self):
        # Do stuff here


def main():
    us_thread = AccessServer(u"us")
    us_thread.start()

    eu_thread = AccessServer(u"eu")
    eu_thread.start()
like image 804
cflewis Avatar asked Dec 22 '22 12:12

cflewis


2 Answers

Just use a try: ... except: ... block in the run method. If something weird happens that causes the thread to fail, it's highly likely that an error will be thrown somewhere in your code (as opposed to in the threading subsystem itself); this way you can catch it, log it, and restart the thread. It's your call whether you want to actually shut down the thread and start a new one, or just enclose the try/except block in a while loop so the same thread keeps running.

Another solution, if you suspect that something really weird might happen which you can't detect through Python's error handling mechanism, would be to start a monitor thread that periodically checks to see that the other threads are running properly.

like image 94
David Z Avatar answered Dec 29 '22 01:12

David Z


Can you have e.g. the main thread function as a monitoring thread? E.g. require that the worker thread regularly update some thread-specific timestamp value, and if a thread hasn't updated it's timestamp within a suitable time, have the monitoring thread kill it and restart?

Or, see this answer

like image 29
janneb Avatar answered Dec 29 '22 01:12

janneb