Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

HttpUtility.HtmlEncode escaping too much?

In our MVC3 ASP.net project, the HttpUtility.HtmlEncode method seems to be escaping too much characters. Our web pages are served as UTF-8 pages, but still the method escapes characters like ü or the Yen character ¥, even though tese characters are part of the UTF-8 set.

So when my asp.net MVC view contains the following piece of code:

    @("<strong>ümlaut</strong>")

Then I would expect the Encoder to escape the html tags, but not the ümlaut

    &lt;strong&gt;ümlaut&lt;/strong&gt;

But instead it is giving me the following piece of HTML:

    &lt;strong&gt;&#252;mlaut&lt;/strong&gt;

For completeness, I also mention that the responseEncoding in the web.config is explictely set to utf-8, so I would expect the HtmlEncode method to respect this setting.

    <globalization requestEncoding="utf-8" responseEncoding="utf-8" />
like image 489
Thomas Avatar asked Feb 03 '12 13:02

Thomas


2 Answers

Yes I have the face the same issue with my web pages. If we see the code of htmlEncode there is a point that translate this set of characters. Here is the code that this kind of characters also translated.

if ((ch >= '\x00a0') && (ch < 'A'))
{
    output.Write("&#");
    output.Write(ch.ToString(NumberFormatInfo.InvariantInfo));
    output.Write(';');
}
else
{
    output.Write(ch);
}

Here is the code of HtmlEncode

public static unsafe void HtmlEncode(string value, TextWriter output)
{
    if (value != null)
    {
        if (output == null)
        {
            throw new ArgumentNullException("output");
        }
        int num = IndexOfHtmlEncodingChars(value, 0);
        if (num == -1)
        {
            output.Write(value);
        }
        else
        {
            int num2 = value.Length - num;
            fixed (char* str = ((char*) value))
            {
                char* chPtr = str;
                char* chPtr2 = chPtr;
                while (num-- > 0)
                {
                    output.Write(chPtr2[0]);
                    chPtr2++;
                }
                while (num2-- > 0)
                {
                    char ch = chPtr2[0];
                    if (ch <= '>')
                    {
                        switch (ch)
                        {
                            case '&':
                            {
                                output.Write("&amp;");
                                chPtr2++;
                                continue;
                            }
                            case '\'':
                            {
                                output.Write("&#39;");
                                chPtr2++;
                                continue;
                            }
                            case '"':
                            {
                                output.Write("&quot;");
                                chPtr2++;
                                continue;
                            }
                            case '<':
                            {
                                output.Write("&lt;");
                                chPtr2++;
                                continue;
                            }
                            case '>':
                            {
                                output.Write("&gt;");
                                chPtr2++;
                                continue;
                            }
                        }
                        output.Write(ch);
                        chPtr2++;
                        continue;
                    }
                    // !here is the point!
                    if ((ch >= '\x00a0') && (ch < 'Ā'))
                    {
                        output.Write("&#");
                        output.Write(ch.ToString(NumberFormatInfo.InvariantInfo));
                        output.Write(';');
                    }
                    else
                    {
                        output.Write(ch);
                    }
                    chPtr2++;
                }
            }
        }
    }
}

a Possible solutions is to make your custom HtmlEncode, or use the Anti-Cross Site scripting from MS.

http://msdn.microsoft.com/en-us/security/aa973814

like image 162
Aristos Avatar answered Nov 07 '22 07:11

Aristos


As Aristos suggested we could use the AntiXSS library from Microsoft. It contains a UnicodeCharacterEncoder that behaves as you would expect.

But because we

  • didn't really want to depend on a 3rd party library just for HTML Encoding
  • were quite sure that our content didn't exceed the UTF-8 range.

We chose to implement our own very basic HTML encoder. You can find the code below. Please feel free to adapt/comment/improve if you see any issues.

public static class HtmlEncoder
{
    private static IDictionary<char, string> toEscape = new Dictionary<char, string>()
                                                            {
                                                                { '<', "lt" },
                                                                { '>', "gt" },
                                                                { '"', "quot" },
                                                                { '&', "amp" },
                                                                { '\'', "#39" },
                                                            };
    /// <summary>
    /// HTML-Encodes the provided value
    /// </summary>
    /// <param name="value">object to encode</param>
    /// <returns>An HTML-encoded string representing the provided value.</returns>
    public static string Encode(object value)
    {
        if (value == null)
            return string.Empty;

        // If value is bare HTML, we expect it to be encoded already
        if (value is IHtmlString)
            return value.ToString();

        string toEncode = value.ToString();

        // Init capacity to length of string to encode
        var builder = new StringBuilder(toEncode.Length);

        foreach (char c in toEncode)
        {
            string result;
            bool success = toEscape.TryGetValue(c, out result);

            string character = success
                                ? "&" + result + ";"
                                : c.ToString();

            builder.Append(character);
        }

        return builder.ToString();
    }
}
like image 20
Thomas Avatar answered Nov 07 '22 07:11

Thomas