I find this surprising, and rather annoying.
Example:
Decode(”) => ”
Encode(”) => ”
Relevant classes:
.NET 4: System.Net.WebUtility
.NET 3.5: System.Web.HttpUtility
I can understand that a web page can be Unicode, but it my case the output cannot be UTF8.
Is there something (perhaps a HtmlWriter class) that could do this without me having to re-invent the wheel?
Alternative solution:
string HtmlUnicodeEncode(string input)
{
var sb = new StringBuilder();
foreach (var c in input)
{
if (c > 127)
{
sb.AppendFormat("&#x{0:X4};", (int)c);
}
else
{
sb.Append(c);
}
}
return sb.ToString();
}
It is impossible to produce an isomorphic HTML codec pair. Consider:
HtmlDecode("”””””") -> ”””””
how do you get back from ”””””
to the original string?
HtmlEncode
has to pick one encoding for ”
, and it goes for ”
as the shortest, most readable alternative. As long as you've got working Unicode, that's almost certainly the best choice.
If you don't, that's another argument... the advantage of ”
is that it's slightly more readable than ”
, but it only works in HTML (not XML) and you still have to fall back to character references for all the Unicode characters that don't have built-in entity names, so it's less consistent. For a character-reference encoder, create an XmlTextWriter
using the ASCII encoding and call writeString
on it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With