Python request.get fails to get an answer for a url I can open on my browser

Tags:

I'm learning how to use python requests (Python 3) and I am trying to make a simple requests.get to get the HTML code from several websites. Although it works for most of them, there is one I am having trouble with.

When I call : http://es.rs-online.com/ everything works fine:

In [1]: import requests
   ...:html = requests.get("http://es.rs-online.com/")
In [2]:html
Out[2]: <Response [200]>

However, when I try it with http://es.farnell.com/, python is unable to solve the address and keeps working on it forever. If I set a timeout, no matter how long, the requests.get() will always be interrupted by the timeout and by nothing else. I have also tried adding headers but it didn't solve the issue. Also, I don't think the error has anything to do with the proxy that I'm using, as I am able to open this website in my browser. Currently, my code looks like this:

import requests
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36'}
html = requests.get("http://es.farnell.com/",headers=headers, timeout=5, allow_redirects = True )

After 5 secs, I get the expected timeout notification.

ReadTimeout: HTTPConnectionPool(host='es.farnell.com', port=80): Read timed out. (read timeout=5)

Does anyone know what could be the issue?

586

asked Jul 03 '18 12:07

ASj

1 Answers

The problem is in your header. Do remember that some site are more lenient than others when it comes to the content of the header you are sending. In order to fix the issue, you should replace your current header with:

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36', "Upgrade-Insecure-Requests": "1","DNT": "1","Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8","Accept-Language": "en-US,en;q=0.5","Accept-Encoding": "gzip, deflate"}

I would also recommend you to send the get request to https://es.farnell.com/ rather than http://es.farnell.com/, remove the timeout = 5 and remove allow_redirects = True (as it is True by default).

All in all your code should look like this:

import requests


headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36', "Upgrade-Insecure-Requests": "1","DNT": "1","Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8","Accept-Language": "en-US,en;q=0.5","Accept-Encoding": "gzip, deflate"}
html = requests.get("https://es.farnell.com",headers=headers)

hope this helps.

answered Oct 03 '22 23:10

Nazim Kerimbekov

Related questions
                            
                                Flush output in for loop in Jupyter notebook
                            
                                ImportError: cannot import name 'PandasError'
                            
                                NotFittedError: TfidfVectorizer - Vocabulary wasn't fitted
                            
                                split string in python to get one value?
                            
                                How to round values only for display in pandas while retaining original ones in the dataframe?
                            
                                How to use the infer_vector in gensim.doc2vec?
                            
                                Pandas dataframe to Spark dataframe, handling NaN conversions to actual null?
                            
                                How to plot pandas groupby values in a graph
                            
                                $ python -bash: /usr/local/bin/python: No such file or directory
                            
                                Redrawing Seaborn Figures for Animations
                            
                                Shade the area between two axhline using matplotlib
                            
                                Python - How to check if socket is still connected
                            
                                Implement Parallel for loops in Python
                            
                                How to glob two patterns with pathlib?
                            
                                ImportError: No module named botocore.session
                            
                                Pyspark filter using startswith from list
                            
                                Trouble getting the trade-price using "Requests-HTML" library
                            
                                How to send bold text using Telegram Python bot
                            
                                replace index values in pandas dataframe with values from list
                            
                                Unpack dictionary from Pandas Column

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python request.get fails to get an answer for a url I can open on my browser

Tags:

python

python-requests

ASj

People also ask

1 Answers

Nazim Kerimbekov

Recent Activity

Donate For Us