Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

unable to requests.get() a website, 'Remote end closed connection without response'

when I try to send a request to this website:

import requests
requests.get('https://www.ldoceonline.com/')

An exception returned

requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))

The weird part is, if you access the website through normal approach(via a browser), they are fully functional and respond very well. Only when you try to retrieve information via web-scraping technique do you encounter this response.

Any idea on how to successfully scraping it?

like image 546
lilpig Avatar asked May 30 '18 06:05

lilpig


2 Answers

If you inspect requests module's code, you will find values of the default headers used while making a request. The above-mentioned User-Agent header is there too.

Seems like a bunch of webresources (whether intentionally or unintentionally) do not process requests properly if the User-Agent header is set to "python-requests/2.21.0".

So the practical solution is to use custom User-Agent header. User-Agent strings for different browsers are provided here.

import requests

url = 'https://www.ldoceonline.com/'
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36"}

r = requests.get(url,headers=headers)
r.raise_for_status()
like image 59
ash17 Avatar answered Sep 22 '22 02:09

ash17


Try using a header to get the desired response.

import requests

res = requests.get('https://www.ldoceonline.com/',headers={"User-Agent":"Mozilla/5.0"})
print(res.status_code)

Output:

200
like image 40
SIM Avatar answered Sep 22 '22 02:09

SIM