Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What can I do to improve socket performance in Python 3?

Initial Post

I have a very long running program where about 97% of the performance is tied up in socket objects created by ftp.retrlines and ftp.retrbinary calls. I have already used processes and threads to parallelize the program. Is there anything else I can do to eek out some more speed?

Example code:

# Get file list
ftpfilelist = []
ftp.retrlines('NLST %s' % ftp_directory, ftpfilelist.append)
... filter file list, this part takes almost no time ...
# Download a file
with open(path, 'wb') as fout:
    ftp.retrbinary('RETR %s' % ftp_path, fout.write)

Output from the cProfiler:

5890792 function calls (5888775 primitive calls) in 548.883 seconds

Ordered by: internal time
List reduced from 843 to 50 due to restriction <50>

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  9166  249.154    0.027  249.154    0.027 {method 'recv_into' of '_socket.socket' objects}
 99573  230.489    0.002  230.489    0.002 {method 'recv' of '_socket.socket' objects}
  1767   53.113    0.030   53.129    0.030 {method 'connect' of '_socket.socket' objects}
 98808    2.839    0.000    2.839    0.000 {method 'write' of '_io.BufferedWriter' objects}

Follow Up

Results for a gevent fork (https://github.com/fantix/gevent) supporting python 3.4.1:

7645675 function calls (7153156 primitive calls) in 301.813 seconds

Ordered by: internal time
List reduced from 948 to 50 due to restriction <50>

ncalls       tottime  percall  cumtime  percall filename:lineno(function)
107541/4418  281.228    0.003  296.499    0.067 gevent/hub.py:354(wait)
99885/59883    4.466    0.000  405.922    0.007 gevent/_socket3.py:248(recv)
99097          2.244    0.000    2.244    0.000 {method 'write' of '_io.BufferedWriter' objects}
111125/2796    1.036    0.000    0.017    0.000 gevent/hub.py:345(switch)
107543/2788    1.000    0.000    0.039    0.000 gevent/hub.py:575(get)

Results for concurrent.futures.ThreadPool:

5319963 function calls (5318875 primitive calls) in 359.541 seconds

Ordered by: internal time
List reduced from 872 to 50 due to restriction <50>

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    31  349.876   11.286  349.876   11.286 {method 'acquire' of '_thread.lock' objects}
  2652    3.293    0.001    3.293    0.001 {method 'recv' of '_socket.socket' objects}
310270    0.790    0.000    0.790    0.000 {method 'timetuple' of 'datetime.date' objects}
    25    0.661    0.026    0.661    0.026 {method 'recv_into' of '_socket.socket' objects}

Conclusion: For my use case, gevent improved performance by about 20%!

like image 842
user12345678 Avatar asked Jun 26 '26 22:06

user12345678


1 Answers

Take a look into gevent. It can monkey patch any libraries you are using (such as your FTP lib), to improve socket performance by using cooperative threads.

The general premise is that threaded programs aren't very efficient with heavy I/O programs because the scheduler doesn't know if the thread is waiting on a network operation, and so the current thread may be scheduled but also wasting time waiting on I/O, while other threads could actually be doing work.

With gevent, as soon as your thread (called a greenlet) hits a blocking network call, it automatically switches to another greenlet. Through this mechanism, your threads/greenlets are used to their fullest potential.

Here's a great introduction to this library: http://www.gevent.org/intro.html#example

like image 172
14 revs, 12 users 16% Avatar answered Jun 29 '26 10:06

14 revs, 12 users 16%



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!