Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to check if the urllib2 follow a redirect?

I've write this function:

def download_mp3(url,name):
        opener1 = urllib2.build_opener()
        page1 = opener1.open(url)
        mp3 = page1.read()
        filename = name+'.mp3'
        fout = open(filename, 'wb')
        fout.write(mp3)
        fout.close()

This function take an url and a name both as string. Then will download and save an mp3 from the url with the name of the variable name.

the url is in the form http://site/download.php?id=xxxx where xxxx is the id of an mp3

if this id does not exist the site redirects me to another page.

So, the question is: how Can I check if this id exist? I've tried to check if the url exist with a function like this:

def checkUrl(url):
    p = urlparse(url)
    conn = httplib.HTTPConnection(p.netloc)
    conn.request('HEAD', p.path)
    resp = conn.getresponse()
    return resp.status < 400

But it's seems not working..

Thank you

like image 457
gaggina Avatar asked Dec 07 '11 11:12

gaggina


People also ask

Do requests follow redirects?

By default, only GET requests resulting in a redirect are automatically followed. If a POST requests is answered with either HTTP 301 Moved Permanently or with 302 Found – the redirect is not automatically followed.

How do I know if a URL is redirected Python?

Get the response object. Get the webserver returned response status code, if the code is 301 then it means the URL has been redirected permanently.

What is the difference between Urllib and urllib2?

1) urllib2 can accept a Request object to set the headers for a URL request, urllib accepts only a URL. 2) urllib provides the urlencode method which is used for the generation of GET query strings, urllib2 doesn't have such a function. This is one of the reasons why urllib is often used along with urllib2.

Is urllib2 deprecated?

urllib2 is deprecated in python 3. x. use urllib instaed.


2 Answers

Something like this, and check code:

import urllib2, urllib

class NoRedirectHandler(urllib2.HTTPRedirectHandler):
    def http_error_302(self, req, fp, code, msg, headers):
        infourl = urllib.addinfourl(fp, headers, req.get_full_url())
        infourl.status = code
        infourl.code = code
        return infourl
    http_error_300 = http_error_302
    http_error_301 = http_error_302
    http_error_303 = http_error_302
    http_error_307 = http_error_302

opener = urllib2.build_opener(NoRedirectHandler())
urllib2.install_opener(opener)
response = urllib2.urlopen('http://google.com')
if response.code in (300, 301, 302, 303, 307):
    print('redirect')
like image 74
polymetr Avatar answered Oct 20 '22 21:10

polymetr


My answer to this looked like

req = urllib2.Request(url)
try:
   response = urllib2.urlopen(url)
except urllib2.HTTPError as e:
   # Do something about it
   raise HoustonWeHaveAProblem
else:
   if response.url != url:
       print 'We have redirected!'
like image 39
kSiR Avatar answered Oct 20 '22 20:10

kSiR