Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python follow redirects and then download the page?

I have the following python script and it works beautifully.

import urllib2  url = 'http://abc.com' # write the url here  usock = urllib2.urlopen(url) data = usock.read() usock.close()  print data 

however, some of the URL's I give it may redirect it 2 or more times. How can I have python wait for redirects to complete before loading the data. For instance when using the above code with

http://www.google.com/search?hl=en&q=KEYWORD&btnI=1 

which is the equvilant of hitting the im lucky button on a google search, I get:

>>> url = 'http://www.google.com/search?hl=en&q=KEYWORD&btnI=1' >>> usick = urllib2.urlopen(url) Traceback (most recent call last):   File "<stdin>", line 1, in <module>   File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 126, in urlopen     return _opener.open(url, data, timeout)   File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 400, in open     response = meth(req, response)   File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 513, in http_response     'http', request, response, code, msg, hdrs)   File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 438, in error     return self._call_chain(*args)   File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 372, in _call_chain     result = func(*args)   File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 521, in http_error_default     raise HTTPError(req.get_full_url(), code, msg, hdrs, fp) urllib2.HTTPError: HTTP Error 403: Forbidden >>>  

Ive tried the (url, data, timeout) however, I am unsure what to put there.

EDIT: I actually found out if I dont redirect and just used the header of the first link, I can grab the location of the next redirect and use that as my final link

like image 361
Cripto Avatar asked Jan 11 '12 22:01

Cripto


People also ask

How do you get a redirected URL request in Python?

Use Python urllib Library To Get Redirection URL.request module. Define a web page URL, suppose this URL will be redirected when you send a request to it. Get the response object. Get the webserver returned response status code, if the code is 301 then it means the URL has been redirected permanently.

How do I follow curl redirect?

To follow redirect with Curl, use the -L or --location command-line option. This flag tells Curl to resend the request to the new address. When you send a POST request, and the server responds with one of the codes 301, 302, or 303, Curl will make the subsequent request using the GET method.


1 Answers

You might be better off with Requests library which has better APIs for controlling redirect handling:

https://requests.readthedocs.io/en/master/user/quickstart/#redirection-and-history

Requests:

https://pypi.org/project/requests/ (urllib replacement for humans)

like image 72
Mikko Ohtamaa Avatar answered Oct 13 '22 16:10

Mikko Ohtamaa