I'm learning how to use python requests (Python 3) and I am trying to make a simple requests.get to get the HTML code from several websites. Although it works for most of them, there is one I am having trouble with.
When I call : http://es.rs-online.com/ everything works fine:
In [1]: import requests
...:html = requests.get("http://es.rs-online.com/")
In [2]:html
Out[2]: <Response [200]>
However, when I try it with http://es.farnell.com/, python is unable to solve the address and keeps working on it forever. If I set a timeout, no matter how long, the requests.get()
will always be interrupted by the timeout and by nothing else. I have also tried adding headers but it didn't solve the issue. Also, I don't think the error has anything to do with the proxy that I'm using, as I am able to open this website in my browser. Currently, my code looks like this:
import requests
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36'}
html = requests.get("http://es.farnell.com/",headers=headers, timeout=5, allow_redirects = True )
After 5 secs, I get the expected timeout notification.
ReadTimeout: HTTPConnectionPool(host='es.farnell.com', port=80): Read timed out. (read timeout=5)
Does anyone know what could be the issue?
How do you pass a query parameter in Python request? To send parameters in URL, write all parameter key:value pairs to a dictionary and send them as params argument to any of the GET, POST, PUT, HEAD, DELETE or OPTIONS request. then https://somewebsite.com/?param1=value1¶m2=value2 would be our final url.
The get() method sends a GET request to the specified url.
The problem is in your header. Do remember that some site are more lenient than others when it comes to the content of the header you are sending. In order to fix the issue, you should replace your current header with:
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36', "Upgrade-Insecure-Requests": "1","DNT": "1","Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8","Accept-Language": "en-US,en;q=0.5","Accept-Encoding": "gzip, deflate"}
I would also recommend you to send the get request to https://es.farnell.com/
rather than http://es.farnell.com/
, remove the timeout = 5
and remove allow_redirects = True
(as it is True by default).
All in all your code should look like this:
import requests
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36', "Upgrade-Insecure-Requests": "1","DNT": "1","Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8","Accept-Language": "en-US,en;q=0.5","Accept-Encoding": "gzip, deflate"}
html = requests.get("https://es.farnell.com",headers=headers)
hope this helps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With