when I try to send a request to this website:
import requests
requests.get('https://www.ldoceonline.com/')
An exception returned
requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))
The weird part is, if you access the website through normal approach(via a browser), they are fully functional and respond very well. Only when you try to retrieve information via web-scraping technique do you encounter this response.
Any idea on how to successfully scraping it?
If you inspect requests module's code, you will find values of the default headers used while making a request. The above-mentioned User-Agent header is there too.
Seems like a bunch of webresources (whether intentionally or unintentionally) do not process requests properly if the User-Agent header is set to "python-requests/2.21.0".
So the practical solution is to use custom User-Agent header. User-Agent strings for different browsers are provided here.
import requests
url = 'https://www.ldoceonline.com/'
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36"}
r = requests.get(url,headers=headers)
r.raise_for_status()
Try using a header to get the desired response.
import requests
res = requests.get('https://www.ldoceonline.com/',headers={"User-Agent":"Mozilla/5.0"})
print(res.status_code)
Output:
200
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With