Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ASP.Net URL Encoding

I am implementing URL rewriting in ASP.net and my URLs are causing me a world of problems.

The URL is generated from a database of departments & categories. I want employees to be able to add items to the database with whatever special characters are appropriate without it breaking the site.

I am encoding the data before I construct the URLs.

There are several problems...

  1. IIS decodes the URL before it reaches .net making it impossible to properly parse anything with a "/" in it.
  2. ASP.net gets confused by the url making "~" useless within certain pages
  3. I migrated from the built in test server to my local IIS server (XP machine) and any URL containing an encoded & (%26) gives me a "Bad Request" error.
  4. UrlEncode leaves some breaking characters untouched such as '.'

I did have two other related posts on this subject, at the time I only saw the small problems not the big problem upstream. I've found some registry tricks to solve the "Bad Request" issue but I'm going to be deploying to a shared hosting environment making that useless. I also know that this is a fix for some security issue so I don't want to necessarily bypass it without knowing what can of worms I'm opening.

Rather than trying to force .net to pass me the raw url, or override IIS settings i'd like to make truly safe URLs in the first place.

I'll note i've tried AntiXss.URLEncode, HttpUtility.URLEncode, URI.EscapeDataString. I've even tried stupid things like double URLEncodng. Is there a utility that does what I need, or do i really need to roll my own. I'm even considering doing something Hacky like replacing the % with an unusual string of characters. The end result should be at least readable which was the point of using URL rewriting in the first place.

Sorry for the long post- I just wanted to make sure that I've included all the necessary details. I can't seem to find any relevant information on this, and it seems like it would be a common problem - so maybe I'm missing something big. Thanks for your help, and patience with the long explanation!


Edit for clarity:

When I say the urls are being built from a database what I mean is that the directory structure is contstructed from the departments and categories in my database.

Some Example URLS -

Mystore/Refrigeration/Bar+Fridge.aspx
Mystore/Cooking+Equipment.aspx
Mystore/Kitchen/Cutting+Boards.asxpx

The problems come in when I use a department like "Beverage & Bar" or "Pastry/Decorating" to construct my URL. Despite being encoded first these cause the aforementioned issues.

My handlers are already implemented and working fine except for the special character encoding issues.

like image 546
Kelly Robins Avatar asked Aug 17 '09 15:08

Kelly Robins


1 Answers

You should consider having a table off of your category/department table which has a unique URL for each category. Then you can use a special routine to generate the URLs. This can be a SQL scalar function, or a CLR function, but one of the things it would do is normalize the URL for the web. You can convert "Beverage & Bar" to "Beverage-And-Bar" and "Pastry / Decorating" to "Pastry-Decorating". Mainly, the routine needs to replace all invalid HTTP URL characters with something else. An example is this:

public static class URL
{
    static readonly Regex feet = new Regex(@"([0-9]\s?)'([^'])", RegexOptions.Compiled);
    static readonly Regex inch1 = new Regex(@"([0-9]\s?)''", RegexOptions.Compiled);
    static readonly Regex inch2 = new Regex(@"([0-9]\s?)""", RegexOptions.Compiled);
    static readonly Regex num = new Regex(@"#([0-9]+)", RegexOptions.Compiled);
    static readonly Regex dollar = new Regex(@"[$]([0-9]+)", RegexOptions.Compiled);
    static readonly Regex percent = new Regex(@"([0-9]+)%", RegexOptions.Compiled);
    static readonly Regex sep = new Regex(@"[\s_/\\+:.]", RegexOptions.Compiled);
    static readonly Regex empty = new Regex(@"[^-A-Za-z0-9]", RegexOptions.Compiled);
    static readonly Regex extra = new Regex(@"[-]+", RegexOptions.Compiled);

    public static string PrepareURL(string str)
    {
        str = str.Trim().ToLower();
        str = str.Replace("&", "and");

        str = feet.Replace(str, "$1-ft-");
        str = inch1.Replace(str, "$1-in-");
        str = inch2.Replace(str, "$1-in-");
        str = num.Replace(str, "num-$1");

        str = dollar.Replace(str, "$1-dollar-");
        str = percent.Replace(str, "$1-percent-");

        str = sep.Replace(str, "-");

        str = empty.Replace(str, string.Empty);
        str = extra.Replace(str, "-");

        str = str.Trim('-');
        return str;
    }
}

You could make this a SQL enhance function, or run URL generation as a separate process. Then to implement mapping, you would map the entire URL directly to a category ID. This approach is better in the long run for several reasons. First, you are not always generating URLs, you do this once and they stay static, you don't have to worry about your procedure changing, and then GoogleBot not being able to find old URLs. Also, if you get a collision, you may notice a potential duplicate category name, because a collision would only be different by special characters. Finally, you can always view your URLs from the database, without having to run the mapping function.

like image 180
eulerfx Avatar answered Oct 17 '22 04:10

eulerfx