Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Urlretrieve and User-Agent? - Python

Tags:

python

urllib

I'm using urlretrieve from the urllib module.

I cannot seem to find how to add a User-Agent description to my requests.


Is it possible with urlretrieve? or do I need to use another method?

like image 263
RadiantHex Avatar asked Mar 02 '10 16:03

RadiantHex


People also ask

What does Urllib request Urlretrieve do?

What does Urllib request Urlretrieve do? In line 14, the urllib. request. urlretrieve() function is used to retrieve the image from the given url and store it to the required file directory.

What is Urllib Python?

Urllib package is the URL handling module for python. It is used to fetch URLs (Uniform Resource Locators). It uses the urlopen function and is able to fetch URLs using a variety of different protocols. Urllib is a package that collects several modules for working with URLs, such as: urllib.


3 Answers

First, set version:

urllib.URLopener.version = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.153 Safari/537.36 SE 2.X MetaSr 1.0'

Then:

filename, headers = urllib.urlretrieve(url)
like image 119
mcbill Avatar answered Nov 01 '22 06:11

mcbill


I know this issue had been there for 7 years. And I reached this issue by trying to figure out how to change the User-Agent while using urlretrieve function.

To anyone who reached this issue by no luck, here is how I did:

    # proxy = ProxyHandler({'http': 'http://192.168.1.31:8888'})
    proxy = ProxyHandler({})
    opener = build_opener(proxy)
    opener.addheaders = [('User-Agent','Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_4) AppleWebKit/603.1.30 (KHTML, like Gecko) Version/10.1 Safari/603.1.30')]
    install_opener(opener)

    result = urlretrieve(url=file_url, filename=file_name)

The reason I added proxy is to monitor the traffic in Charles, and here is the traffic I got:

See the User-Agent

like image 42
Tonny Xu Avatar answered Nov 01 '22 05:11

Tonny Xu


You can use URLopener or FancyURLopener classes. The 'version' argument specifies the user agent of the opener object.

opener = FancyURLopener({}) 
opener.version = 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.69 Safari/537.36'
opener.retrieve('http://example.com', 'index.html')
like image 5
d.rey Avatar answered Nov 01 '22 07:11

d.rey