HttpUtility.HtmlEncode doesn't encode everything

Question

I am interacting with a web server using a desktop client program in C# and .Net 3.5. I am using Fiddler to see what traffic the web browser sends, and emulate that. Sadly this server is old, and is a bit confused about the notions of charsets and utf-8. Mostly it uses Latin-1.

When I enter data into the Web browser containing "special" chars, like "Ω π ℵ ∞ ♣ ♥ ♈ ♉ ♊ ♋ ♌ ♍ ♎ ♏ ♐ ♑ ♒ ♓" fiddler show me that they are being transmitted as follows from browser to server: "♈ ♉ ♊ ♋ ♌ ♍ ♎ ♏ ♐ ♑ ♒ ♓ "

But for my client, HttpUtility.HtmlEncode does not convert these characters, it leaves them as is. What do I need to call to convert "♈" to ♈ and so on?

Rick · Accepted Answer

It seems horribly inefficient, but the only way I can think to do that is to look through each character:

public static string MyHtmlEncode(string value)
{
   // call the normal HtmlEncode first
   char[] chars = HttpUtility.HtmlEncode(value).ToCharArray();
   StringBuilder encodedValue = new StringBuilder();
   foreach(char c in chars)
   {
      if ((int)c > 127) // above normal ASCII
         encodedValue.Append("&#" + (int)c + ";");
      else
         encodedValue.Append(c);
   }
   return encodedValue.ToString();
}

bdukes · Answer

Rich Strahl just posted a blog post, Html and Uri String Encoding without System.Web, where he has some custom code that encodes the upper range of characters, too.

/// <summary>
/// HTML-encodes a string and returns the encoded string.
/// </summary>
/// <param name="text">The text string to encode. </param>
/// <returns>The HTML-encoded text.</returns>
public static string HtmlEncode(string text)
{
    if (text == null)
        return null;

    StringBuilder sb = new StringBuilder(text.Length);

    int len = text.Length;
    for (int i = 0; i < len; i++)
    {
        switch (text[i])
        {

            case '<':
                sb.Append("&lt;");
                break;
            case '>':
                sb.Append("&gt;");
                break;
            case '"':
                sb.Append("&quot;");
                break;
            case '&':
                sb.Append("&amp;");
                break;
            default:
                if (text[i] > 159)
                {
                    // decimal numeric entity
                    sb.Append("&#");
                    sb.Append(((int)text[i]).ToString(CultureInfo.InvariantCulture));
                    sb.Append(";");
                }
                else
                    sb.Append(text[i]);
                break;
        }
    }
    return sb.ToString();
}

AnthonyWJones · Answer

The return value type of HtmlEncode is a string, which is of Unicode and hence has not need to encode these characters.

If the encoding of your output stream is not compatible with these characters then use HtmlEncode like this:-

 HttpUtility.HtmlEncode(outgoingString, Response.Output);

HtmlEncode with then escape the characters appropriately.

Joel Fillmore · Answer

The AntiXSS library from Microsoft correctly encodes these characters.

AntiXSS on Codeplex

Nuget package (best way to add as a reference)

HttpUtility.HtmlEncode doesn't encode everything

Tags:

html

c#

encoding

utf-8

Anthony

4 Answers

Rick

bdukes

AnthonyWJones

Joel Fillmore

Recent Activity

Donate For Us

HttpUtility.HtmlEncode doesn't encode everything

Tags:

html

c#

encoding

utf-8

Anthony

4 Answers

Rick

bdukes

AnthonyWJones

Joel Fillmore

Related questions

Recent Activity

Donate For Us