Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to correctly canonicalize a URL in an ASP.NET MVC application?

I'm trying to find a good general purpose way to canonicalize urls in an ASP.NET MVC 2 application. Here's what I've come up with so far:

// Using an authorization filter because it is executed earlier than other filters
public class CanonicalizeAttribute : AuthorizeAttribute
{
    public bool ForceLowerCase { get;set; }

    public CanonicalizeAttribute()
        : base()
    {
        ForceLowerCase = true;
    }

    public override void OnAuthorization(AuthorizationContext filterContext)
    {
        RouteValueDictionary values = ExtractRouteValues(filterContext);
        string canonicalUrl = new UrlHelper(filterContext.RequestContext).RouteUrl(values);
        if (ForceLowerCase)
            canonicalUrl = canonicalUrl.ToLower();

        if (filterContext.HttpContext.Request.Url.PathAndQuery != canonicalUrl)
            filterContext.Result = new PermanentRedirectResult(canonicalUrl);
    }

    private static RouteValueDictionary ExtractRouteValues(AuthorizationContext filterContext)
    {
        var values = filterContext.RouteData.Values.Union(filterContext.RouteData.DataTokens).ToDictionary(x => x.Key, x => x.Value);
        var queryString = filterContext.HttpContext.Request.QueryString;
        foreach (string key in queryString.Keys)
        {
            if (!values.ContainsKey(key))
                values.Add(key, queryString[key]);
        }
        return new RouteValueDictionary(values);
    }
}

// Redirect result that uses permanent (301) redirect
public class PermanentRedirectResult : RedirectResult
{
    public PermanentRedirectResult(string url) : base(url) { }

    public override void ExecuteResult(ControllerContext context)
    {
        context.HttpContext.Response.RedirectPermanent(this.Url);
    }
}

Now I can mark up my controllers like this:

[Canonicalize]
public class HomeController : Controller { /* ... */ }

This all appears to work fairly well, but I have the following concerns:

  1. I still have to add the CanonicalizeAttribute to every controller (or action method) I want canonicalized, when it's hard to think of a situation where I won't want this behaviour. It seems like there should be a way to get this behaviour site-wide, rather than one controller at a time.

  2. The fact that I'm implementing the 'force to lower-case' rule in the filter seems wrong. Surely it would be better to somehow role this up into the route url logic, but I can't think of a way to do this in my routing configuration. I thought of adding @"[a-z]*" constraints to the controller and action parameters (as well as any other string route parameters), but I think this will cause the routes to not be matched. Also, because the lower-case rule isn't being applied at the route level, it's possible to generate links in my pages that have upper-case letters in them, which seems pretty bad.

Is there something obvious I'm overlooking here?

like image 776
Bennor McCarthy Avatar asked Sep 26 '10 09:09

Bennor McCarthy


People also ask

How do I Canonicalize a URL?

Use a rel="canonical" link tag To indicate when a page is a duplicate of another page, you can use a <link> tag in the head section of your HTML. Suppose you want https://example.com/dresses/green-dresses to be the canonical URL, even though a variety of URLs can access this content.

How does MVC handle wrong URL?

First: Add a route pattern to the end of your routing table that matches to any URL. Second: Set the default controller and action values for this pattern to the controller/action method that will display the "more helpful" View you want to provide. (And, I guess, a third step: Provide that controller/action/View.)

What is URL in ASP NET MVC?

The ASP.NET MVC framework includes a flexible URL routing system that enables you to define URL mapping rules within your applications. The routing system has two main purposes: Map incoming URLs to the application and route them so that the right Controller and Action method executes to process them.

What is self canonical URL?

A self-referencing canonical tag, as it sounds, is one that canonicals to itself. This ensures that multiple versions of the page (duplicates) don't get indexed separately. For example, the page https://www.example.com would have a rel=”canonical” tag that points to https://www.example.com (the same URL).


1 Answers

MVC 5 and 6 has the option of generating lower case URL's for your routes. My route config is shown below:

public static class RouteConfig
{
    public static void RegisterRoutes(RouteCollection routes)
    {
        // Imprive SEO by stopping duplicate URL's due to case or trailing slashes.
        routes.AppendTrailingSlash = true;
        routes.LowercaseUrls = true;

        routes.IgnoreRoute("{resource}.axd/{*pathInfo}");

        routes.MapRoute(
            name: "Default",
            url: "{controller}/{action}/{id}",
            defaults: new { controller = "Home", action = "Index", id = UrlParameter.Optional });
    }
}

With this code, you should no longer need the canonicalize the URL's as this is done for you. One problem that can occur if you are using HTTP and HTTPS URL's and want a canonical URL for this. In this case, it's pretty easy to use the above approaches and replace HTTP with HTTPS or vice versa.

Another problem is external websites that link to your site may omit the trailing slash or add upper-case characters and for this you should perform a 301 permanent redirect to the correct URL with the trailing slash. For full usage and source code, refer to my blog post and the RedirectToCanonicalUrlAttribute filter:

/// <summary>
/// To improve Search Engine Optimization SEO, there should only be a single URL for each resource. Case 
/// differences and/or URL's with/without trailing slashes are treated as different URL's by search engines. This 
/// filter redirects all non-canonical URL's based on the settings specified to their canonical equivalent. 
/// Note: Non-canonical URL's are not generated by this site template, it is usually external sites which are 
/// linking to your site but have changed the URL case or added/removed trailing slashes.
/// (See Google's comments at http://googlewebmastercentral.blogspot.co.uk/2010/04/to-slash-or-not-to-slash.html
/// and Bing's at http://blogs.bing.com/webmaster/2012/01/26/moving-content-think-301-not-relcanonical).
/// </summary>
[AttributeUsage(AttributeTargets.Method | AttributeTargets.Class, Inherited = true, AllowMultiple = false)]
public class RedirectToCanonicalUrlAttribute : FilterAttribute, IAuthorizationFilter
{
    private readonly bool appendTrailingSlash;
    private readonly bool lowercaseUrls;

    #region Constructors

    /// <summary>
    /// Initializes a new instance of the <see cref="RedirectToCanonicalUrlAttribute" /> class.
    /// </summary>
    /// <param name="appendTrailingSlash">If set to <c>true</c> append trailing slashes, otherwise strip trailing 
    /// slashes.</param>
    /// <param name="lowercaseUrls">If set to <c>true</c> lower-case all URL's.</param>
    public RedirectToCanonicalUrlAttribute(
        bool appendTrailingSlash, 
        bool lowercaseUrls)
    {
        this.appendTrailingSlash = appendTrailingSlash;
        this.lowercaseUrls = lowercaseUrls;
    } 

    #endregion

    #region Public Methods

    /// <summary>
    /// Determines whether the HTTP request contains a non-canonical URL using <see cref="TryGetCanonicalUrl"/>, 
    /// if it doesn't calls the <see cref="HandleNonCanonicalRequest"/> method.
    /// </summary>
    /// <param name="filterContext">An object that encapsulates information that is required in order to use the 
    /// <see cref="RedirectToCanonicalUrlAttribute"/> attribute.</param>
    /// <exception cref="ArgumentNullException">The <paramref name="filterContext"/> parameter is <c>null</c>.</exception>
    public virtual void OnAuthorization(AuthorizationContext filterContext)
    {
        if (filterContext == null)
        {
            throw new ArgumentNullException("filterContext");
        }

        if (string.Equals(filterContext.HttpContext.Request.HttpMethod, "GET", StringComparison.Ordinal))
        {
            string canonicalUrl;
            if (!this.TryGetCanonicalUrl(filterContext, out canonicalUrl))
            {
                this.HandleNonCanonicalRequest(filterContext, canonicalUrl);
            }
        }
    }

    #endregion

    #region Protected Methods

    /// <summary>
    /// Determines whether the specified URl is canonical and if it is not, outputs the canonical URL.
    /// </summary>
    /// <param name="filterContext">An object that encapsulates information that is required in order to use the 
    /// <see cref="RedirectToCanonicalUrlAttribute" /> attribute.</param>
    /// <param name="canonicalUrl">The canonical URL.</param>
    /// <returns><c>true</c> if the URL is canonical, otherwise <c>false</c>.</returns>
    protected virtual bool TryGetCanonicalUrl(AuthorizationContext filterContext, out string canonicalUrl)
    {
        bool isCanonical = true;

        canonicalUrl = filterContext.HttpContext.Request.Url.ToString();
        int queryIndex = canonicalUrl.IndexOf(QueryCharacter);

        if (queryIndex == -1)
        {
            bool hasTrailingSlash = canonicalUrl[canonicalUrl.Length - 1] == SlashCharacter;

            if (this.appendTrailingSlash)
            {
                // Append a trailing slash to the end of the URL.
                if (!hasTrailingSlash)
                {
                    canonicalUrl += SlashCharacter;
                    isCanonical = false;
                }
            }
            else
            {
                // Trim a trailing slash from the end of the URL.
                if (hasTrailingSlash)
                {
                    canonicalUrl = canonicalUrl.TrimEnd(SlashCharacter);
                    isCanonical = false;
                }
            }
        }
        else
        {
            bool hasTrailingSlash = canonicalUrl[queryIndex - 1] == SlashCharacter;

            if (this.appendTrailingSlash)
            {
                // Append a trailing slash to the end of the URL but before the query string.
                if (!hasTrailingSlash)
                {
                    canonicalUrl = canonicalUrl.Insert(queryIndex, SlashCharacter.ToString());
                    isCanonical = false;
                }
            }
            else
            {
                // Trim a trailing slash to the end of the URL but before the query string.
                if (hasTrailingSlash)
                {
                    canonicalUrl = canonicalUrl.Remove(queryIndex - 1, 1);
                    isCanonical = false;
                }
            }
        }

        if (this.lowercaseUrls)
        {
            foreach (char character in canonicalUrl)
            {
                if (char.IsUpper(character))
                {
                    canonicalUrl = canonicalUrl.ToLower();
                    isCanonical = false;
                    break;
                }
            }
        }

        return isCanonical;
    }

    /// <summary>
    /// Handles HTTP requests for URL's that are not canonical. Performs a 301 Permanent Redirect to the canonical URL.
    /// </summary>
    /// <param name="filterContext">An object that encapsulates information that is required in order to use the 
    /// <see cref="RedirectToCanonicalUrlAttribute" /> attribute.</param>
    /// <param name="canonicalUrl">The canonical URL.</param>
    protected virtual void HandleNonCanonicalRequest(AuthorizationContext filterContext, string canonicalUrl)
    {
        filterContext.Result = new RedirectResult(canonicalUrl, true);
    }

    #endregion
}

Usage example to ensure all requests are 301 redirected to the correct canonical URL:

filters.Add(new RedirectToCanonicalUrlAttribute(
    RouteTable.Routes.AppendTrailingSlash, 
    RouteTable.Routes.LowercaseUrls));
like image 198
Muhammad Rehan Saeed Avatar answered Sep 20 '22 11:09

Muhammad Rehan Saeed