Getting the Redirected URL from the Original URL

Tags:

I have a table in my database which contains the URLs of some websites. I have to open those URLs and verify some links on those pages. The problem is that some URLs get redirected to other URLs. My logic is failing for such URLs.

Is there some way through which I can pass my original URL string and get the redirected URL back?

Example: I am trying with this URL: http://individual.troweprice.com/public/Retail/xStaticFiles/FormsAndLiterature/CollegeSavings/trp529Disclosure.pdf

It gets redirected to this one: http://individual.troweprice.com/staticFiles/Retail/Shared/PDFs/trp529Disclosure.pdf

I tried to use following code:

HttpWebRequest req = (HttpWebRequest)WebRequest.Create(Uris);
req.Proxy = proxy;
req.Method = "HEAD";
req.AllowAutoRedirect = false;

HttpWebResponse myResp = (HttpWebResponse)req.GetResponse();
if (myResp.StatusCode == HttpStatusCode.Redirect)
{
  MessageBox.Show("redirected to:" + myResp.GetResponseHeader("Location"));
}

When I execute the code above it gives me HttpStatusCodeOk. I am surprised why it is not considering it a redirection. If I open the link in Internet Explorer then it will redirect to another URL and open the PDF file.

Can someone help me understand why it is not working properly for the example URL?

By the way, I checked with Hotmail's URL (http://www.hotmail.com) and it correctly returns the redirected URL.

882

asked Apr 01 '09 10:04

user85594

2 Answers

This function will return the final destination of a link — even if there are multiple redirects. It doesn't account for JavaScript-based redirects or META redirects. Notice that the previous solution didn't deal with Absolute & Relative URLs, since the LOCATION header could return something like "/newhome" you need to combine with the URL that served that response to identify the full URL destination.

    public static string GetFinalRedirect(string url)
    {
        if(string.IsNullOrWhiteSpace(url))
            return url;

        int maxRedirCount = 8;  // prevent infinite loops
        string newUrl = url;
        do
        {
            HttpWebRequest req = null;
            HttpWebResponse resp = null;
            try
            {
                req = (HttpWebRequest) HttpWebRequest.Create(url);
                req.Method = "HEAD";
                req.AllowAutoRedirect = false;
                resp = (HttpWebResponse)req.GetResponse();
                switch (resp.StatusCode)
                {
                    case HttpStatusCode.OK:
                        return newUrl;
                    case HttpStatusCode.Redirect:
                    case HttpStatusCode.MovedPermanently:
                    case HttpStatusCode.RedirectKeepVerb:
                    case HttpStatusCode.RedirectMethod:
                        newUrl = resp.Headers["Location"];
                        if (newUrl == null)
                            return url;

                        if (newUrl.IndexOf("://", System.StringComparison.Ordinal) == -1)
                        {
                            // Doesn't have a URL Schema, meaning it's a relative or absolute URL
                            Uri u = new Uri(new Uri(url), newUrl);
                            newUrl = u.ToString();
                        }
                        break;
                    default:
                        return newUrl;
                }
                url = newUrl;
            }
            catch (WebException)
            {
                // Return the last known good URL
                return newUrl;
            }
            catch (Exception ex)
            {
                return null;
            }
            finally
            {
                if (resp != null)
                    resp.Close();
            }
        } while (maxRedirCount-- > 0);

        return newUrl;
    }

answered Sep 22 '22 11:09

Marcelo Calbucci

The URL you mentioned uses a JavaScript redirect, which will only redirect a browser. So there's no easy way to detect the redirect.

For proper (HTTP Status Code and Location:) redirects, you might want to remove

req.AllowAutoRedirect = false;

and get the final URL using

myResp.ResponseUri

as there can be more than one redirect.

UPDATE: More clarification regarding redirects:

There's more than one way to redirect a browser to another URL.

The first way is to use a 3xx HTTP status code, and the Location: header. This is the way the gods intended HTTP redirects to work, and is also known as "the one true way." This method will work on all browsers and crawlers.

And then there are the devil's ways. These include meta refresh, the Refresh: header, and JavaScript. Although these methods work in most browsers, they are definitely not guaranteed to work, and occasionally result in strange behavior (aka. breaking the back button).

Most web crawlers, including the Googlebot, ignore these redirection methods, and so should you. If you absolutely have to detect all redirects, then you would have to parse the HTML for META tags, look for Refresh: headers in the response, and evaluate Javascript. Good luck with the last one.

answered Sep 20 '22 11:09

Can Berk Güder

Related questions
                            
                                Dictionary with object as value
                            
                                RestSharp Deserialization with JSON Array
                            
                                FromBluetoothAddressAsync IAsyncOperation does not contain a definition for 'GetAwaiter' error
                            
                                How to insert 'Empty' field in ComboBox bound to DataTable
                            
                                C#: Generic types that have a constructor?
                            
                                Application.ProductName equivalent in WPF?
                            
                                Run an async function in another thread
                            
                                Iterate over pixels of an image with emgu cv
                            
                                how can i disable close button of console window in a visual studio console application?
                            
                                wrapping content in a StackPanel wpf
                            
                                Expression to create an instance with object initializer
                            
                                Get single listView SelectedItem
                            
                                Initializing a 'var' to null
                            
                                List<string[]> determine max length by Linq
                            
                                Ninject in .NET Core
                            
                                Method that returns Task<string>
                            
                                How to access current absolute Uri from any ASP .Net Core class?
                            
                                Free compression library for C# which supports 7zip (LZMA) [closed]
                            
                                How to bind LINQ data to dropdownlist
                            
                                "Read only" Property Accessor in C#

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With