Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

setting referral url in python urllib.urlretrieve

Tags:

python

html

I am using urllib.urlretrieve in Python to download websites. Though some websites seem to not want me to download them, unless they have a proper referrer from their own site. Does anybody know of a way I can set a referrer in one of Python's libraries or a external one to.

like image 261
Recursion Avatar asked Jan 20 '10 02:01

Recursion


People also ask

What is urllib in Python?

Urllib package is the URL handling module for python. It is used to fetch URLs (Uniform Resource Locators). It uses the urlopen function and is able to fetch URLs using a variety of different protocols. Urllib is a package that collects several modules for working with URLs, such as:

What is the urllib request module?

The urllib.request module defines functions and classes which help in opening URLs (mostly HTTP) in a complex world — basic and digest authentication, redirections, cookies and more. The Requests package is recommended for a higher-level HTTP client interface. The urllib.request module defines the following functions:

How to get back to default handlers in urllib?

From this point when you call urllib.request.urlretrieve () or anything which is using the urlopen () it will use for HTTP communication your handler. When you want to get back to default handlers you can just call: To be honest I don't know if it is better/cleaner solution then yours but it uses prepared mechanisms in the urllib.

How to get urllib to tell servers that it is a particular user?

To get urllib to tell servers that it is a particular user agent, set this in a subclass as a class variable or in the constructor before calling the base constructor. class urllib.request.


2 Answers

import urllib2
req = urllib2.Request('http://www.example.com/')
req.add_header('Referer', 'http://www.python.org/')
r = urllib2.urlopen(req)

adopted from http://docs.python.org/library/urllib2.html

like image 134
Dyno Fu Avatar answered Oct 01 '22 13:10

Dyno Fu


urllib makes it hard to send arbitrary headers with the request; you could use urllib2, which lets you build and send a Request object with arbitrary headers (including of course the -- alas sadly spelled;-) -- Referer). Doesn't offer urlretrieve, but it's easy to just urlopen as you with and copy the resulting file-like object to disk if you want (directly, or e.g. via shutil functions).

like image 29
Alex Martelli Avatar answered Oct 01 '22 14:10

Alex Martelli