Multiple Python Processes slow

Question

I have a python script which goes off and makes a number of HTTP and urllib requests to various domains.

We have a huge amount of domains to processes and need to do this as quickly as possible. As HTTP requests are slow (i.e. they could time out of there is no website on the domain) I run a number of the scripts at any one time feeding them from a domains list in the database.

The problem I see is over a period of time (a few hours to 24 hours) the scripts all start to slow down and ps -al shows they are sleeping.

The servers are very powerful (8 cores, 72GB ram, 6TB Raid 6 etc etc 80MB 2:1 connection) and are never maxed out, i.e. Free -m shows

-/+ buffers/cache:      61157      11337
Swap:         4510        195       4315

Top shows between 80-90% idle

sar -d shows average 5.3% util

and more interestingly iptraf starts off at around 50-60MB/s and ends up 8-10MB/s after about 4 hours.

I am currently running around 500 versions of the script on each server (2 servers) and they both show the same problem.

ps -al shows that most of the python scripts are sleeping which I don't understand why for instance:

0 S 0 28668  2987  0  80   0 - 71003 sk_wai pts/2 00:00:03 python
0 S 0 28669  2987  0  80   0 - 71619 inet_s pts/2 00:00:31 python
0 S 0 28670  2987  0  80   0 - 70947 sk_wai pts/2 00:00:07 python
0 S 0 28671  2987  0  80   0 - 71609 poll_s pts/2 00:00:29 python
0 S 0 28672  2987  0  80   0 - 71944 poll_s pts/2 00:00:31 python
0 S 0 28673  2987  0  80   0 - 71606 poll_s pts/2 00:00:26 python
0 S 0 28674  2987  0  80   0 - 71425 poll_s pts/2 00:00:20 python
0 S 0 28675  2987  0  80   0 - 70964 sk_wai pts/2 00:00:01 python
0 S 0 28676  2987  0  80   0 - 71205 inet_s pts/2 00:00:19 python
0 S 0 28677  2987  0  80   0 - 71610 inet_s pts/2 00:00:21 python
0 S 0 28678  2987  0  80   0 - 71491 inet_s pts/2 00:00:22 python

There is no sleep state in the script that gets executed so I can't understand why ps -al shows most of them asleep and why they should get slower and slower making less IP requests over time when CPU, memory, disk access and bandwidth are all available in abundance.

If anyone could help I would be very grateful.

EDIT:

The code is massive as I am using exceptions through it to catch diagnostics about the domain, i.e. reasons I can't connect. Will post the code somewhere if needed, but the fundamental calls via HTTPLib and URLLib are straight off the python examples.

More info:

Both

quota -u mysql quota -u root

come back with nothing

nlimit -n comes back with 1024 have change limit.conf to allow mysql to allow 16000 soft and hard connections and am able to running over 2000 script so far but still still the problem.

SOME PROGRESS

Ok, so I have changed all the limits for the user, ensured all sockets are closed (they were not) and although things are better, I am still getting a slow down although not as bad.

Interestingly I have also noticed some memory leak - the scripts use more and more memory the longer they run, however I am not sure what is causing this. I store output data in a string and then print it to the terminal after every iteration, I do clear the string at the end too but could the ever increasing memory be down to the terminal storing all the output?

Edit: No seems not - ran up 30 scripts without outputting to terminal and still the same leak. I'm not using anything clever (just strings, HTTPlib and URLLib) - wonder if there are any issues with the python mysql connector...?

chown · Accepted Answer

Check the ulimit and quota for the box and the user running the scripts. /etc/security/limits.conf may also contain resource restrictions that you might want to modify.

ulimit -n will show the max number of open file descriptors allowed.

Might this have been exceeded with all of the open sockets?
Is the script closing each sockets when it's done with it?

You can also check the fd's with ls -l /proc/[PID]/fd/ where [PID] is the process id of one of the scripts.

Would need to see some code to tell what's really going on..

Edit (Importing comments and more troubleshooting ideas):

Can you show the code where your opening and closing the connections?
When just run a few script processes are running, do they too start to go idle after a while? Or is it only when there are several hundred+ running at once that this happens?
Is there a single parent process that starts all of these scripts?

If your using s = urllib2.urlopen(someURL), make sure to s.close() when your done with it. Python can often close things down for you (like if your doing x = urllib2.urlopen(someURL).read()), but it will leave that to you if you if told to (such as assigning a variable to the return value of .urlopen()). Double check your opening and closing of urllib calls (or all I/O code to be safe). If each script is designed to only have 1 open socket at a time, and your /proc/PID/fd is showing multiple active/open sockets per script process, then there is definitely a code issue to fix.

ulimit -n showing 1024 is giving the limit of open socket/fd's that the mysql user can have, you can change this with ulimit -S -n [LIMIT_#] but check out this article first:
Changing process.max-file-descriptor using 'ulimit -n' can cause MySQL to change table_open_cache value.

You may need to log out and shell back in after. And/Or add it to /etc/bashrc (don't forget to source /etc/bashrc if you change bashrc and don't want to log out/in).

Disk space is another thing that I have found out (the hard way) can cause very weird issues. I have had processes act like they are running (not zombied) but not doing what is expected because they had open handles to a log file on a partition with zero disk space left.

netstat -anpTee | grep -i mysql will also show if these sockets are connected/established/waiting to be closed/waiting on timeout/etc.

watch -n 0.1 'netstat -anpTee | grep -i mysql' to see the sockets open/close/change state/etc in real time in a nice table output (may need to export GREP_OPTIONS= first if you have it set to something like --color=always).

lsof -u mysql or lsof -U will also show you open FD's (the output is quite verbose).

import urllib2
import socket

socket.settimeout(15) 
# or settimeout(0) for non-blocking:
#In non-blocking mode (blocking is the default), if a recv() call 
# doesn’t find any data, or if a send() call can’t
# immediately dispose of the data,
# a error exception is raised.

#......

try:
    s = urllib2.urlopen(some_url)
    # do stuff with s like s.read(), s.headers, etc..
except (HTTPError, etcError):
    # myLogger.exception("Error opening: %s!", some_url)
finally:
    try:
        s.close()
    # del s - although, I don't know if deleting s will help things any.
    except:
        pass

Some man pages and reference links:

ulimit

quota

limits.conf

fork bomb

Changing process.max-file-descriptor using 'ulimit -n' can cause MySQL to change table_open_cache value

python socket module

lsof

Multiple Python Processes slow

Tags:

performance

python

http

unix

task

SOME PROGRESS

dan360

1 Answers

chown

Recent Activity

Donate For Us

Multiple Python Processes slow

Tags:

performance

python

http

unix

task

SOME PROGRESS

dan360

1 Answers

chown

Related questions

Recent Activity

Donate For Us