Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Server.UrlEncode vs. HttpUtility.UrlEncode

People also ask

What is HttpUtility UrlEncode?

HttpUtility.HtmlEncode Method (System.Web)Converts a string into an HTML-encoded string. To encode or decode values outside of a web application, use the WebUtility class.

Do I need to UrlEncode?

Why do we need to encode? URLs can only have certain characters from the standard 128 character ASCII set. Reserved characters that do not belong to this set must be encoded. This means that we need to encode these characters when passing into a URL.

How do you change the special characters in a URL?

URL encoding replaces unsafe ASCII characters with a "%" followed by two hexadecimal digits. URLs cannot contain spaces. URL encoding normally replaces a space with a plus (+) sign or with %20.


I had significant headaches with these methods before, I recommend you avoid any variant of UrlEncode, and instead use Uri.EscapeDataString - at least that one has a comprehensible behavior.

Let's see...

HttpUtility.UrlEncode(" ") == "+" //breaks ASP.NET when used in paths, non-
                                  //standard, undocumented.
Uri.EscapeUriString("a?b=e") == "a?b=e" // makes sense, but rarely what you
                                        // want, since you still need to
                                        // escape special characters yourself

But my personal favorite has got to be HttpUtility.UrlPathEncode - this thing is really incomprehensible. It encodes:

  • " " ==> "%20"
  • "100% true" ==> "100%%20true" (ok, your url is broken now)
  • "test A.aspx#anchor B" ==> "test%20A.aspx#anchor%20B"
  • "test A.aspx?hmm#anchor B" ==> "test%20A.aspx?hmm#anchor B" (note the difference with the previous escape sequence!)

It also has the lovelily specific MSDN documentation "Encodes the path portion of a URL string for reliable HTTP transmission from the Web server to a client." - without actually explaining what it does. You are less likely to shoot yourself in the foot with an Uzi...

In short, stick to Uri.EscapeDataString.


HttpServerUtility.UrlEncode will use HttpUtility.UrlEncode internally. There is no specific difference. The reason for existence of Server.UrlEncode is compatibility with classic ASP.


Fast-forward almost 9 years since this was first asked, and in the world of .NET Core and .NET Standard, it seems the most common options we have for URL-encoding are WebUtility.UrlEncode (under System.Net) and Uri.EscapeDataString. Judging by the most popular answer here and elsewhere, Uri.EscapeDataString appears to be preferable. But is it? I did some analysis to understand the differences and here's what I came up with:

  • WebUtility.UrlEncode encodes space as +; Uri.EscapeDataString encodes it as %20.
  • Uri.EscapeDataString percent-encodes !, (, ), and *; WebUtility.UrlEncode does not.
  • WebUtility.UrlEncode percent-encodes ~; Uri.EscapeDataString does not.
  • Uri.EscapeDataString throws a UriFormatException on strings longer than 65,520 characters; WebUtility.UrlEncode does not. (A more common problem than you might think, particularly when dealing with URL-encoded form data.)
  • Uri.EscapeDataString throws a UriFormatException on the high surrogate characters; WebUtility.UrlEncode does not. (That's a UTF-16 thing, probably a lot less common.)

For URL-encoding purposes, characters fit into one of 3 categories: unreserved (legal in a URL); reserved (legal in but has special meaning, so you might want to encode it); and everything else (must always be encoded).

According to the RFC, the reserved characters are: :/?#[]@!$&'()*+,;=

And the unreserved characters are alphanumeric and -._~

The Verdict

Uri.EscapeDataString clearly defines its mission: %-encode all reserved and illegal characters. WebUtility.UrlEncode is more ambiguous in both definition and implementation. Oddly, it encodes some reserved characters but not others (why parentheses and not brackets??), and stranger still it encodes that innocently unreserved ~ character.

Therefore, I concur with the popular advice - use Uri.EscapeDataString when possible, and understand that reserved characters like / and ? will get encoded. If you need to deal with potentially large strings, particularly with URL-encoded form content, you'll need to either fall back on WebUtility.UrlEncode and accept its quirks, or otherwise work around the problem.


EDIT: I've attempted to rectify ALL of the quirks mentioned above in Flurl via the Url.Encode, Url.EncodeIllegalCharacters, and Url.Decode static methods. These are in the core package (which is tiny and doesn't include all the HTTP stuff), or feel free to rip them from the source. I welcome any comments/feedback you have on these.


Here's the code I used to discover which characters are encoded differently:

var diffs =
    from i in Enumerable.Range(0, char.MaxValue + 1)
    let c = (char)i
    where !char.IsHighSurrogate(c)
    let diff = new {
        Original = c,
        UrlEncode = WebUtility.UrlEncode(c.ToString()),
        EscapeDataString = Uri.EscapeDataString(c.ToString()),
    }
    where diff.UrlEncode != diff.EscapeDataString
    select diff;

foreach (var diff in diffs)
    Console.WriteLine($"{diff.Original}\t{diff.UrlEncode}\t{diff.EscapeDataString}");

Keep in mind that you probably shouldn't be using either one of those methods. Microsoft's Anti-Cross Site Scripting Library includes replacements for HttpUtility.UrlEncode and HttpUtility.HtmlEncode that are both more standards-compliant, and more secure. As a bonus, you get a JavaScriptEncode method as well.


Server.UrlEncode() is there to provide backward compatibility with Classic ASP,

Server.UrlEncode(str);

Is equivalent to:

HttpUtility.UrlEncode(str, Response.ContentEncoding);