According to the following table for the ISO-8859-1 standard, there seems to be an entity name and an entity number associated with each reserved HTML character. So for example, for the character <code>é</code> : Entity Name : <code>&eacute;</code> Entity Number : <code>&#233;</code> Similarly, for the character <code>></code> : Entity Name : <code>&gt;</code> Entity Number : <code>&#62;</code> For a given string, the <code>HttpUtility.HtmlEncode</code> returns an HTML encoded String, but I can't figure out how it works. Here is what I mean : <pre class="prettyprint"><code>Console.WriteLine(HtmlEncode("é>")); //Outputs &#233;&gt; </code></pre> It seems to be using the entity number for the <code>é</code> character but the entity name for the <code>></code> character. So does the HtmlEncode method really work with the ISO-8859-1 standard? If it does, is there a reason why it sometimes uses the entity name and other times the entity number? More importantly, can I force it to give me the entity name reliably? EDIT : Thanks for the answers guys. I cannot decode the string before I perform the search though. Without getting into too many details, the text is stored in a SharePoint List and the "search" is done by SharePoint itself (using a CAML query). So basically, I can't. I'm trying to think of a way to convert the entity numbers into names, is there a function in .NET that does that? Or any other idea?

That's how the method has been implemented. For some known characters it uses the corresponding entity and for everything else it uses the corresponding hex value and there is not much you could do to modify this behavior. Excerpt from the implementation of <code>System.Net.WebUtility.HtmlEncode</code> (as seen with reflector): <pre class="prettyprint"><code>... if (ch <= '>') { switch (ch) { case '&': { output.Write("&amp;"); continue; } case '\'': { output.Write("&#39;"); continue; } case '"': { output.Write("&quot;"); continue; } case '<': { output.Write("&lt;"); continue; } case '>': { output.Write("&gt;"); continue; } } output.Write(ch); continue; } if ((ch >= '\x00a0') && (ch < 'Ā')) { output.Write("&#"); output.Write(((int) ch).ToString(NumberFormatInfo.InvariantInfo)); output.Write(';'); } ... </code></pre> This being said you shouldn't care as this method will always produce valid, safe and correctly encoded HTML.

C# HtmlEncode - ISO-8859-1 Entity Names vs Numbers

Tags:

string

c#

.net

encoding

iso

According to the following table for the ISO-8859-1 standard, there seems to be an entity name and an entity number associated with each reserved HTML character.

So for example, for the character é :

Entity Name : é

Entity Number : é

Similarly, for the character > :

Entity Name : >

Entity Number : >

For a given string, the HttpUtility.HtmlEncode returns an HTML encoded String, but I can't figure out how it works. Here is what I mean :

Console.WriteLine(HtmlEncode("é>"));
//Outputs &#233;&gt;

It seems to be using the entity number for the é character but the entity name for the > character.

So does the HtmlEncode method really work with the ISO-8859-1 standard? If it does, is there a reason why it sometimes uses the entity name and other times the entity number? More importantly, can I force it to give me the entity name reliably?

EDIT : Thanks for the answers guys. I cannot decode the string before I perform the search though. Without getting into too many details, the text is stored in a SharePoint List and the "search" is done by SharePoint itself (using a CAML query). So basically, I can't.

I'm trying to think of a way to convert the entity numbers into names, is there a function in .NET that does that? Or any other idea?

987

asked Jan 31 '11 17:01

Hugo Migneron

1 Answers

That's how the method has been implemented. For some known characters it uses the corresponding entity and for everything else it uses the corresponding hex value and there is not much you could do to modify this behavior. Excerpt from the implementation of System.Net.WebUtility.HtmlEncode (as seen with reflector):

...
if (ch <= '>')
{
    switch (ch)
    {
        case '&':
        {
            output.Write("&amp;");
            continue;
        }
        case '\'':
        {
            output.Write("&#39;");
            continue;
        }
        case '"':
        {
            output.Write("&quot;");
            continue;
        }
        case '<':
        {
            output.Write("&lt;");
            continue;
        }
        case '>':
        {
            output.Write("&gt;");
            continue;
        }
    }
    output.Write(ch);
    continue;
}
if ((ch >= '\x00a0') && (ch < 'Ā'))
{
    output.Write("&#");
    output.Write(((int) ch).ToString(NumberFormatInfo.InvariantInfo));
    output.Write(';');
}
...

This being said you shouldn't care as this method will always produce valid, safe and correctly encoded HTML.

121

answered Sep 28 '22 03:09

Darin Dimitrov

Related questions
                            
                                C# REPL tools; quick console-like compiling tool
                            
                                WPF How should I evaluate a property path?
                            
                                Are there any books on Lucene.NET [closed]
                            
                                Is there a generic way to synchronize an asynchronous method?
                            
                                Calculate GPS coordinates to form a radius of given size
                            
                                Loading a ConfigurationSection with a required child ConfigurationElement with .Net configuration framework
                            
                                How to deserialize null array to null in c#?
                            
                                Productivity research material [closed]
                            
                                How to use custom IComparer for SortedDictionary?
                            
                                Getting my head around object oriented programming
                            
                                How do I determine if a packet is RTP/RTCP?
                            
                                Dynamically changing Mouse speed
                            
                                ELMAH: Only sending specific exception type via mail
                            
                                Which .NET ORM has best support for PostgreSQL database
                            
                                Formatting a table in a plain text email in C#
                            
                                C# Parallel Vs. Threaded code performance
                            
                                Why does Math.Exp give different results between 32-bit and 64-bit, with same input, same hardware
                            
                                Is serializable attribute needed in concrete C# class?
                            
                                Is there any way to stop a WPF Popup from repositioning itself when it goes off-screen?
                            
                                Reviews/Comparison of Open Source ASP.NET MVC CMS [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With