I'm trying to program a simple web-crawler using the Requests module, and I would like to know how to disable its -default- keep-alive feauture.
I tried using:
s = requests.session()
s.config['keep_alive'] = False
However, I get an error stating that session object has no attribute 'config', I think it was changed with the new version, but i cannot seem to find how to do it in the official documentation.
The truth is when I run the crawler on a specific website, it only gets five pages at most, and then keeps looping around infinitely, so I thought it has something to do with the keep-alive feature!
PS: is Requests a good module for a web-crawler? is there something more adapted?
Thank you !
The Benefits of Connection Keep Alive The HTTP keep-alive header maintains a connection between a client and your server, reducing the time needed to serve files. A persistent connection also reduces the number of TCP and SSL/TLS connection requests, leading to a drop in round trip time (RTT).
Using the close() method. Using the quit() method.
Overview. The default HTTP connection is usually closed after each request has been completed, meaning that the server closes the TCP connection after delivering the response. In order to keep the connection open for multiple requests, the keep-alive connection header can be used.
This works
s = requests.session()
s.keep_alive = False
Answered in the comments of a similar question.
I am not sure but can you try passing {"Connection": "close"} as HTTP headers when sending a GET request using requests. This will close the connection as soon a server returns a response.
>>> headers = {"Connection": "close"}
>>> r = requests.get('https://example.xcom', headers=headers)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With