Python follow redirects and then download the page?

Tags:

I have the following python script and it works beautifully.

import urllib2  url = 'http://abc.com' # write the url here  usock = urllib2.urlopen(url) data = usock.read() usock.close()  print data

however, some of the URL's I give it may redirect it 2 or more times. How can I have python wait for redirects to complete before loading the data. For instance when using the above code with

http://www.google.com/search?hl=en&q=KEYWORD&btnI=1

which is the equvilant of hitting the im lucky button on a google search, I get:

>>> url = 'http://www.google.com/search?hl=en&q=KEYWORD&btnI=1' >>> usick = urllib2.urlopen(url) Traceback (most recent call last):   File "<stdin>", line 1, in <module>   File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 126, in urlopen     return _opener.open(url, data, timeout)   File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 400, in open     response = meth(req, response)   File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 513, in http_response     'http', request, response, code, msg, hdrs)   File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 438, in error     return self._call_chain(*args)   File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 372, in _call_chain     result = func(*args)   File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 521, in http_error_default     raise HTTPError(req.get_full_url(), code, msg, hdrs, fp) urllib2.HTTPError: HTTP Error 403: Forbidden >>>

Ive tried the (url, data, timeout) however, I am unsure what to put there.

EDIT: I actually found out if I dont redirect and just used the header of the first link, I can grab the location of the next redirect and use that as my final link

361

asked Jan 11 '12 22:01

Cripto

1 Answers

You might be better off with Requests library which has better APIs for controlling redirect handling:

https://requests.readthedocs.io/en/master/user/quickstart/#redirection-and-history

Requests:

https://pypi.org/project/requests/ (urllib replacement for humans)

answered Oct 13 '22 16:10

Mikko Ohtamaa

Related questions
                            
                                Finding longest overlapping ranges [duplicate]
                            
                                Seaborn ValueError: zero-size array to reduction operation minimum which has no identity
                            
                                Deriving class from `object` in python
                            
                                ImportError DLL load failed importing _tkinter
                            
                                Equivalent of asyncio.Queues with worker "threads"
                            
                                Pip Install not installing into correct directory?
                            
                                Dangers of sys.setdefaultencoding('utf-8')
                            
                                Why the column order is changing while appending pandas dataframes?
                            
                                Python __getitem__ and in operator result in strange behavior
                            
                                How to convert a timedelta object into a datetime object
                            
                                numpy uint8 pixel wrapping solution
                            
                                Numpy random choice of tuples
                            
                                Why does python add an 'L' on the end of the result of large exponents? [duplicate]
                            
                                datetime range filter in PySpark SQL
                            
                                How can I log outside of main Flask module?
                            
                                How do you get the name of the tensorflow output nodes in a Keras Model?
                            
                                Why isn't this code to plot a histogram on a continuous value Pandas column working?
                            
                                Python 3 type hinting for decorator
                            
                                Looping through files in a folder
                            
                                Matplotlib 3D Scatter Plot with Colorbar

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python follow redirects and then download the page?

Tags:

python

html

web-scraping

Cripto

People also ask

1 Answers

Mikko Ohtamaa

Recent Activity

Donate For Us