Why socket implementation is slower than requests?

Question

I have a python 3.4 script fetching multiple web pages. At first, I used requests library to fetch pages:

def get_page_requsets(url):
    r = requests.get(url)
    return r.content

Above code gives an average speed of 4.6 requests per second. To increase speed I rewrote function to use sockets library:

def get_page_socket(url):

    url = urlparse(url)
    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    sock.connect((url.netloc, 80))
    req = '''
GET {} HTTP/1.1\r
Host: {}\r
Connection: Keep-Alive\r
\r
    '''.format(url.path, url.host, uagent)
    sock.send(req.encode())
    reply = b''
    while True:
        chunk = sock.recv(65535)
        if chunk:
            reply += chunk
        else:
            break
    sock.close()
    return reply

And average speed fell to 4.04 requests per second. I was not hoping for drammatic speed boost, but was hoping for slight increase, as socket is more low level. Is this library issue or I'm doing something wrong?

Martijn Pieters · Accepted Answer

requests uses urllib3, which handles HTTP connections very efficiently. Connections to the same server are re-used wherever possible, saving you the socket connection and teardown costs:

Re-use the same socket connection for multiple requests, with optional client-side certificate verification. See: HTTPConnectionPool and HTTPSConnectionPool

In addition, urllib3 and requests advertise to the server that they can handle compressed responses; with compression you can transfer more data in the same amount of time, leading to more requests per second.

Supports gzip and deflate decoding. See: decode_gzip() and decode_deflate()

urllib3 uses sockets too (albeit via the http.client module); there is little point in reinventing this wheel. Perhaps you should think about fetching URLs in parallel instead, using threading or multiprocessing, or eventlets; the requests author has a gevents-requests integration package that can help there. Another way of achieving concurrency would be to use asyncio combined with aiohttp as HTTP requests are mostly waiting for network I/O.

Why socket implementation is slower than requests?

Tags:

python

sockets

python-requests

eyeinthebrick

1 Answers

Martijn Pieters

Recent Activity

Donate For Us

Why socket implementation is slower than requests?

Tags:

python

sockets

python-requests

eyeinthebrick

1 Answers

Martijn Pieters

Related questions

Recent Activity

Donate For Us