Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

urllib3 download a file using specified user agent

Tags:

python

urllib3

What is the correct way to update the user agent information in urllib3?

How can I check that the user agent information was indeed changed and is being used?

For example:

user_agent = {'user-agent': 'Mozilla/5.0 (Windows NT 6.3; rv:36.0) Gecko/20100101 Firefox/36.0'}
http = urllib3.PoolManager(10, headers=user_agent)

r1 = http.request('GET', 'http://example.com/')
if r1.status is 200:
    with open('somefile','w+') as f:
        f.write(r1.data)

When I create a PoolManager at http I looked at it by dir(http) and saw that http.headers was empty by default and updated to the user agent info specified, but is it being used? Is there anyway to check without having to look at apache logs?

And actually checking /var/log/apache2/access.log after trying to update the user agent:

>>> import urllib3
>>> user_agent = {'user-agent': 'Mozilla/5.0 (Windows NT 6.3; rv:36.0) Gecko/20100101 Firefox/36.0'}
>>> http = urllib3.PoolManager(2, headers=user_agent)
>>> r = http.request('GET','localhost')
>>> with open('/var/log/apache2/access.log','r') as f:
...     last_line = f.readlines()[-1]
... 
>>> last_line
'127.0.0.1 - - [08/Dec/2014:20:42:04 -0500] "GET / HTTP/1.1" 200 461 "-" "-"\n'
like image 761
jmunsch Avatar asked Dec 09 '14 01:12

jmunsch


Video Answer


1 Answers

header argument should be headers:

http = urllib3.PoolManager(10, header=user_agent)

You can confirm that headers were set correctly using sites like httpbin.org:

>>> import urllib3
>>> user_agent = {'user-agent': 'Mozilla/5.0 (Windows NT 6.3; rv:36.0) ..'}
>>> http = urllib3.PoolManager(10, headers=user_agent)
>>> r1 = http.urlopen('GET', 'http://httpbin.org/headers')
>>> print(r1.data)
{
  "headers": {
    "Accept-Encoding": "identity",
    "Connect-Time": "1",
    "Connection": "close",
    "Host": "httpbin.org",
    "Total-Route-Time": "0",
    "User-Agent": "Mozilla/5.0 (Windows NT 6.3; rv:36.0) Gecko/20100101 Firefox/36.0",
    "Via": "1.1 vegur",
    "X-Request-Id": "5ef53f21-6caf-4e45-8123-98e417cd05ba"
  }
}

or you can use a packet analyzer (eg. Wireshark).

like image 173
falsetru Avatar answered Oct 15 '22 09:10

falsetru