I have a python 3.4 script fetching multiple web pages. At first, I used requests library to fetch pages:
def get_page_requsets(url):
r = requests.get(url)
return r.content
Above code gives an average speed of 4.6 requests per second. To increase speed I rewrote function to use sockets library:
def get_page_socket(url):
url = urlparse(url)
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.connect((url.netloc, 80))
req = '''
GET {} HTTP/1.1\r
Host: {}\r
Connection: Keep-Alive\r
\r
'''.format(url.path, url.host, uagent)
sock.send(req.encode())
reply = b''
while True:
chunk = sock.recv(65535)
if chunk:
reply += chunk
else:
break
sock.close()
return reply
And average speed fell to 4.04 requests per second. I was not hoping for drammatic speed boost, but was hoping for slight increase, as socket is more low level. Is this library issue or I'm doing something wrong?
requests
uses urllib3
, which handles HTTP connections very efficiently. Connections to the same server are re-used wherever possible, saving you the socket connection and teardown costs:
- Re-use the same socket connection for multiple requests, with optional client-side certificate verification. See:
HTTPConnectionPool
andHTTPSConnectionPool
In addition, urllib3
and requests
advertise to the server that they can handle compressed responses; with compression you can transfer more data in the same amount of time, leading to more requests per second.
- Supports gzip and deflate decoding. See:
decode_gzip()
anddecode_deflate()
urllib3
uses sockets too (albeit via the http.client
module); there is little point in reinventing this wheel. Perhaps you should think about fetching URLs in parallel instead, using threading or multiprocessing, or eventlets; the requests
author has a gevents-requests integration package that can help there. Another way of achieving concurrency would be to use asyncio
combined with aiohttp
as HTTP requests are mostly waiting for network I/O.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With