Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

monitor stuck python processes

I have a python script that performs URL requests using the urllib2. I have a pool of 5 processes that run asynchronously and perform a function. This function is the one that makes the url calls, gets data, parses it into the required format, performs calculations and inserts data. The amount of data varies for each url request.

I run this script every 5 minutes using a cron job. Sometimes when i do ps -ef | grep python, I see stuck processes. Is there a way where in I can keep track of the processes meaning within the multiprocessing class that can keep track of the processes, their state meaning completed, stuck or dead and so on? Here is a code snippet:

This is how i call async processes

pool = Pool(processes=5)
pool.apply_async(getData, )

And the following is a part of getData which performs urllib2 requests:

try:
    Url = "http://gotodatasite.com"

    data = urllib2.urlopen(Url).read().split('\n')
except URLError, e:
    print "Error:",e.code
    print e.reason
    sys.exit(0)

Is there a way to track stuck processes and rerun them again?

like image 691
ash Avatar asked Nov 14 '22 18:11

ash


1 Answers

Implement a ping mechanism if you are so inclined to use multiprocessing. You're looking for processes that have become stuck because of slow I/O, I assume?

Personally I would go with a queue (not necessarily a queue server), say for example ~/jobs is a list of URLs to work on, then have a program that takes the first job and performs it. Then it's just a matter of bookkeeping - say, have the program note when it was started and what its PID is. If you need to kill slow jobs, just kill the PID and mark the job as failed.

like image 141
lericson Avatar answered Dec 18 '22 10:12

lericson