Why are HtmlEncode and HtmlDecode not isomorphic in .NET?

Question

I find this surprising, and rather annoying.

Example:

Decode(&rdquo;) => ”
Encode(”)       => ”

Relevant classes:

.NET 4:   System.Net.WebUtility
.NET 3.5: System.Web.HttpUtility

I can understand that a web page can be Unicode, but it my case the output cannot be UTF8.

Is there something (perhaps a HtmlWriter class) that could do this without me having to re-invent the wheel?

Alternative solution:

string HtmlUnicodeEncode(string input)
{
    var sb = new StringBuilder();

    foreach (var c in input)
    {
        if (c > 127)
        {
            sb.AppendFormat("&#x{0:X4};", (int)c);
        }
        else
        {
            sb.Append(c);
        }
    }

    return sb.ToString();
}

bobince · Accepted Answer

It is impossible to produce an isomorphic HTML codec pair. Consider:

HtmlDecode("&rdquo;”&#x201D;&#x201d;&#8221;") -> ”””””

how do you get back from ””””” to the original string?

HtmlEncode has to pick one encoding for ”, and it goes for ” as the shortest, most readable alternative. As long as you've got working Unicode, that's almost certainly the best choice.

If you don't, that's another argument... the advantage of ” is that it's slightly more readable than ”, but it only works in HTML (not XML) and you still have to fall back to character references for all the Unicode characters that don't have built-in entity names, so it's less consistent. For a character-reference encoder, create an XmlTextWriter using the ASCII encoding and call writeString on it.

Why are HtmlEncode and HtmlDecode not isomorphic in .NET?

Tags:

.net

html-encode

unicode

html-entities

leppie

1 Answers

bobince

Recent Activity

Donate For Us

Why are HtmlEncode and HtmlDecode not isomorphic in .NET?

Tags:

.net

html-encode

unicode

html-entities

leppie

1 Answers

bobince

Related questions

Recent Activity

Donate For Us