Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python request.get fails to get an answer for a url I can open on my browser

I'm learning how to use python requests (Python 3) and I am trying to make a simple requests.get to get the HTML code from several websites. Although it works for most of them, there is one I am having trouble with.

When I call : http://es.rs-online.com/ everything works fine:

In [1]: import requests
   ...:html = requests.get("http://es.rs-online.com/")
In [2]:html
Out[2]: <Response [200]>

However, when I try it with http://es.farnell.com/, python is unable to solve the address and keeps working on it forever. If I set a timeout, no matter how long, the requests.get() will always be interrupted by the timeout and by nothing else. I have also tried adding headers but it didn't solve the issue. Also, I don't think the error has anything to do with the proxy that I'm using, as I am able to open this website in my browser. Currently, my code looks like this:

import requests
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36'}
html = requests.get("http://es.farnell.com/",headers=headers, timeout=5, allow_redirects = True )

After 5 secs, I get the expected timeout notification.

ReadTimeout: HTTPConnectionPool(host='es.farnell.com', port=80): Read timed out. (read timeout=5)

Does anyone know what could be the issue?

like image 586
ASj Avatar asked Jul 03 '18 12:07

ASj


People also ask

How do you pass a query parameter in url request in Python?

How do you pass a query parameter in Python request? To send parameters in URL, write all parameter key:value pairs to a dictionary and send them as params argument to any of the GET, POST, PUT, HEAD, DELETE or OPTIONS request. then https://somewebsite.com/?param1=value1&param2=value2 would be our final url.

What does request Get url do?

The get() method sends a GET request to the specified url.


1 Answers

The problem is in your header. Do remember that some site are more lenient than others when it comes to the content of the header you are sending. In order to fix the issue, you should replace your current header with:

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36', "Upgrade-Insecure-Requests": "1","DNT": "1","Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8","Accept-Language": "en-US,en;q=0.5","Accept-Encoding": "gzip, deflate"}

I would also recommend you to send the get request to https://es.farnell.com/ rather than http://es.farnell.com/, remove the timeout = 5 and remove allow_redirects = True (as it is True by default).


All in all your code should look like this:

import requests


headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36', "Upgrade-Insecure-Requests": "1","DNT": "1","Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8","Accept-Language": "en-US,en;q=0.5","Accept-Encoding": "gzip, deflate"}
html = requests.get("https://es.farnell.com",headers=headers)

hope this helps.

like image 57
Nazim Kerimbekov Avatar answered Oct 03 '22 23:10

Nazim Kerimbekov