Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Prevent response body download in python async http requests

I want to "ping" a server, check the header response to see if the link is broken, and if it's not broken, actually download the response body.

Traditionally, using a sync method with the requests module, you could send a get request with the stream = True parameter, and capture the headers before the response body download, deciding, in case of error (not found, for example), to abort the connection.

My problem is, doing this with the async libraries grequests or requests-futures has become impossible for my reduced knowdlege base.

I've tried setting the stream parameter to true in request-futures but to no use, it still downloads the response body without letting me intervene as soon as it gets the response headers. And even if it did, I wouldn't be sure of how to proceed.

This is what I've tried:

test.py

from requests_futures.sessions import FuturesSession

session = FuturesSession()
session.stream = True

future = session.get('http://www.google.com')
response = future.result()
print(response.status_code) # Here I would assume the response body hasn't been loaded

Upon debugging I find it downloads the response body either way.

I would appreciate any solution to the initial problem, whether it follows my logic or not.

like image 594
undefined Avatar asked Mar 23 '17 02:03

undefined


1 Answers

I believe what you want is an HTTP HEAD request:

session.head('http://www.google.com')

Per w3.org, "the HEAD method is identical to GET except that the server MUST NOT return a message-body in the response." If you like the status code and headers, you can follow-up with a normal GET request.

For the comments, it looks like you might also be interested in doing this in a single request. It is possible to do so directly with sockets. Send the normal GET request, do a recv of the first block, if you don't like the result, close the connection, otherwise loop over the remaining blocks.

Here is a proof of concept of how to download conditionally with a single request:

import socket

def fetch_on_header_condition(host, resource, condition, port=80):
    request =  'GET %s HTTP/1.1\r\n' % resource
    request += 'Host: %s\r\n' % host
    request += 'Connection: close\r\n'
    request += '\r\n'

    s = socket.socket()
    try:
        s.connect((host, port))
        s.send(request)
        first_block = s.recv(4096)
        if not condition(first_block):
            return False, ''
        blocks = [first_block]
        while True:
            block = s.recv(4096)
            if not block:
                break
            blocks.append(block)
        return True, ''.join(blocks)
    finally:
        s.close()

if __name__ == '__main__':
    print fetch_on_header_condition(
        host = 'www.jython.org',
        port = 80,
        resource = '/',
        condition = lambda s: 'Content-Type: text/xml' in s,
    )
like image 180
Raymond Hettinger Avatar answered Sep 21 '22 19:09

Raymond Hettinger