Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Return last URL in sequence of redirects

I sometimes need to parse with Beautiful Soup and Requests URLs that are provided as such:

http://bit.ly/sdflksdfwefwe

http://stup.id/sdfslkjsfsd

http://0.r.msn.com/sdflksdflsdj

Of course, these URLs generally 'resolve' to a canonical URL some as http://real-website.com/page.html. How can I get the last URL in the resolution / redirect chain?

My code generally looks like this:

from bs4 import BeautifulSoup
import requests

response = requests.get(url)
soup = bs4.BeautifulSoup(response.text, from_encoding=response.encoding)
canonical_url = response.??? ## This is what I need to know

Note that I don't mean to query http://bit.ly/bllsht to see where it goes, but rather when I am using Beautiful Soup to already parse the page that it returns, to also get the canonical URL that was the last in the redirect chain.

Thanks.

like image 498
dotancohen Avatar asked Jun 12 '13 09:06

dotancohen


People also ask

How do I capture a URL before a redirect?

Type "cache:sitename.com" in the address bar of Chrome and press "Enter" where "sitename" is the URL that is generating the redirect. This will show you a cached version of the site on which you can use the Inspect Element pane to find and capture the redirect URL.

What is URL redirect trail?

URL Redirect (also referred to as URL Forwarding) is a technique which is used to redirect your domain's visitors to a different URL. You can forward your domain name to any website, webpage, etc.

How do I follow redirects with curl?

To follow redirect with Curl, use the -L or --location command-line option. This flag tells Curl to resend the request to the new address. When you send a POST request, and the server responds with one of the codes 301, 302, or 303, Curl will make the subsequent request using the GET method.


1 Answers

It's in the url attribute of your response object.

>>> response = requests.get('http://bit.ly/bllsht')
>>> response.url
  > u'http://www.thenews.org/sports/well-hey-there-murray-state-1-21-11-1.2436937'

You could easily find this information in the “Quick Start” page.

like image 86
kirelagin Avatar answered Sep 28 '22 06:09

kirelagin