Requests.get in Python using "User-Agent" not simulating a browser request

Tags:

I have to collect information from webpages using Python from a Linux terminal, it works wonderful but some pages (not all of them) are retrieving invalid URL's when I try to use requests.get due to they have agents detectors and they don't know how to answer my request (I'm not a browser or mobile application from a Linux terminal).

Using "User-Agent" header didn't work either, I tried several different ways to send it to emulate I am a Mozilla browser:

user_agent = {'User-Agent': 'Mozilla/5.0'}

user_agent = {'User-Agent': 'Mozilla/5.0 (Windows; U; Windows NT 5.1; hu-HU; rv:1.7.8) Gecko/20050511 Firefox/1.0.4'}

or many other combinations.

In some servers when I try to use this line:

page = requests.get(url, headers=user_agent)

I get a bad request, because these servers try to send me a webpage for desktop or mobile browsers and they fail to identify it.

Am I doing something wrong sending a User-Agent in this way? I tried my code in a Python Notebook and it works perfectly due to I'm currently (of course) sending a request from a browser.

895

asked May 26 '14 21:05

Maximiliano Rios

1 Answers

You are using a very old user agent and indeed some sites will block you because of this.

>>> import requests
>>> header = {'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:32.0) Gecko/20100101 Firefox/32.0',}
>>> url = 'http://www.w3.org/'
>>> r = requests.get(url, headers=header)
>>> r.headers
CaseInsensitiveDict({'content-length': '40737', 'content-location': 'Home.html', 'accept-ranges': 'bytes', 'expires': 'Tue, 24 Jun 2014 04:44:36 GMT', 'vary': 'negotiate,accept', 'server': 'Apache/2', 'tcn': 'choice', 'last-modified': 'Mon, 23 Jun 2014 11:15:15 GMT', 'etag': '"9f21-4fc7ef51956c0;89-3f26bd17a2f00"', 'cache-control': 'max-age=600', 'date': 'Tue, 24 Jun 2014 04:34:36 GMT', 'p3p': 'policyref="http://www.w3.org/2001/05/P3P/p3p.xml"', 'content-type': 'text/html; charset=utf-8'})
>>> r.request.headers
CaseInsensitiveDict({'Accept-Encoding': 'gzip, deflate, compress', 'Accept': '*/*', 'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:32.0) Gecko/20100101 Firefox/32.0'})
>>>

128

answered Oct 01 '22 17:10

karlcow

Related questions
                            
                                Manipulating browser (window) size using Splinter
                            
                                Python: how to do lazy debug logging
                            
                                Python code output to a file and add timestamp to filename
                            
                                Django: loaddata not working
                            
                                How to remove scheme from url in Python?
                            
                                Trying to install Couchbase, with gcc command fails, Python
                            
                                scapy: Operation not permitted when sending packets
                            
                                How to unit test a function that does not return anything?
                            
                                Python - concatenate 2 lists
                            
                                How to avoid e-05 in python
                            
                                xpath how to get before the last element of <a>
                            
                                TA-Lib numpy "AssertionError: real is not double"
                            
                                Django loaddata - Out of Memory
                            
                                Set DJANGO_SETTINGS_MODULE as an Environment Variable in Windows permanently
                            
                                How to ignore empty lines while using .next_sibling in BeautifulSoup4 in python
                            
                                Set background color for subplot
                            
                                Why can't I detect that the tuple is empty?
                            
                                Python MySQL Connector database query with %s fails
                            
                                Python Pandas figsize not defined
                            
                                pandas pivot_table multiple aggfunc

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Requests.get in Python using "User-Agent" not simulating a browser request

Tags:

python

python-requests

Maximiliano Rios

People also ask

1 Answers

karlcow

Recent Activity

Donate For Us