I have a python client which pushes a great deal of data through the standard library's httlib. Users are complainging that the application is slow. I suspect that this may be partly due to the HTTP client I am using.
Could I improve performance by replacing httplib with something else?
I've seen that twisted offers a HTTP client. It seems to be very basic compared to their other protocol offerings.
PyCurl might be a valid alternative, however it's use seems to be very un-pythonic, on the other hand if it's performance is really good then I can put up with a bit of un-pythonic code.
So if you have experience of better HTTP client libraries of python please tell me about it. I'd like to know what you thought of the performance relative to httplib and what you thought of the quality of implementation.
UPDATE 0: My use of httplib is actually very limited - the replacement needs to do the following:
conn = httplib.HTTPConnection(host, port)
conn.request("POST", url, params, headers)
compressedstream = StringIO.StringIO(conn.getresponse().read())
That's all: No proxies, redirection or any fancy stuff. It's plain-old HTTP. I just need to be able to do it as fast as possible.
UPDATE 1: I'm stuck with Python2.4 and I'm working on Windows 32. Please do not tell me about better ways to use httplib - I want to know about some of the alternatives to httplib.
Source code: Lib/httplib.py. This module defines classes which implement the client side of the HTTP and HTTPS protocols. It is normally not used directly — the module urllib uses it to handle URLs that use HTTP and HTTPS. HTTPS support is only available if the socket module was compiled with SSL support.
Users are complainging that the application is slow. I suspect that this may be partly due to the HTTP client I am using.
Could I improve performance by replacing httplib with something else?
Do you suspect it or are you sure that that it's httplib
? Profile before you do anything to improve the performance of your app.
I've found my own intuition on where time is spent is often pretty bad (given that there isn't some code kernel executed millions of times). It's really disappointing to implement something to improve performance then pull up the app and see that it made no difference.
If you're not profiling, you're shooting in the dark!
Often when I've had performance problems with httplib, the problem hasn't been with the httplib itself, but with how I'm using it. Here are a few common pitfalls:
(1) Don't make a new TCP connection for every web request. If you are making lots of request to the same server, instead of this pattern:
conn = httplib.HTTPConnection("www.somewhere.com") conn.request("GET", '/foo') conn = httplib.HTTPConnection("www.somewhere.com") conn.request("GET", '/bar') conn = httplib.HTTPConnection("www.somewhere.com") conn.request("GET", '/baz')
Do this instead:
conn = httplib.HTTPConnection("www.somewhere.com") conn.request("GET", '/foo') conn.request("GET", '/bar') conn.request("GET", '/baz')
(2) Don't serialize your requests. You can use threads or asynccore or whatever you like, but if you are making multiple requests from different servers, you can improve performance by running them in parallel.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With