Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Link with status code 200 redirects

I have a link which has status code 200. But when I open it in browser it redirects.

On fetching the same link with Python Requests it simply shows the data from the original link. I tried both Python Requests and urllib but had no success.

  1. How to capture the final URL and its data?

  2. How can a link with status 200 redirect?

>>> url ='http://www.afaqs.com/news/story/52344_The-target-is-to-get-advertisers-to-switch-from-print-to-TV-Ravish-Kumar-Viacom18'
>>> r = requests.get(url)
>>> r.url
'http://www.afaqs.com/news/story/52344_The-target-is-to-get-advertisers-to-switch-from-print-to-TV-Ravish-Kumar-Viacom18'
>>> r.history
[]
>>> r.status_code
200

This is the link

Redirected link

like image 503
Nandesh Avatar asked Nov 23 '25 18:11

Nandesh


2 Answers

This kind of redirect is done by JavaScript. So, you won't directly get the redirected link using requests.get(...). The original URL has the following page source:

<html>
    <head>
        <meta http-equiv="refresh" content="0;URL=http://www.afaqs.com/interviews/index.html?id=572_The-target-is-to-get-advertisers-to-switch-from-print-to-TV-Ravish-Kumar-Viacom18">
        <script type="text/javascript" src="http://gc.kis.v2.scr.kaspersky-labs.com/D5838D60-3633-1046-AA3A-D5DDF145A207/main.js" charset="UTF-8"></script>
    </head>
    <body bgcolor="#FFFFFF"></body>
</html>

Here, you can see the redirected URL. Your job is to scrape that. You can do it using RegEx, or simply some string split operations.

For example:

r = requests.get('http://www.afaqs.com/news/story/52344_The-target-is-to-get-advertisers-to-switch-from-print-to-TV-Ravish-Kumar-Viacom18')
redirected_url = r.text.split('URL=')[1].split('">')[0]
print(redirected_url)
# http://www.afaqs.com/interviews/index.html?id=572_The-target-is-to-get-advertisers-to-switch-from-print-to-TV-Ravish-Kumar-Viacom18

r = requests.get(redirected_url)
# Start scraping from this link...

Or, using a regex:

redirected_url = re.findall(r'URL=(http.*)">', r.text)[0]
like image 94
Keyur Potdar Avatar answered Nov 26 '25 11:11

Keyur Potdar


These kind of url's are present in script tag as they are javascript code. Therefore they are nor fetched by python.

To get the link simply extract them from their respective tags.

like image 42
Deepshikha Sethi Avatar answered Nov 26 '25 09:11

Deepshikha Sethi