I'm building a website where I have to work with less then perfect masterdata (I guess I'm not the only one :-))
In my case I have to render an xml filte to html (using xsl). Sometimes the masterdata is using html-enitites allready (eg ;é
in french words) so there I have to use 'disable-output-escaping='yes') there in order to avoid double encoding.
The easiest solution is disable output escaping all together, so I never run the risk of a double encoding.
The only characters that misses encoding for this masterdata are the ampersands. But when I parse them 'raw' (so rather & than &
all browsers seem to be ok with it.
So the question : what are the consequenses of using not encoded ampersands in html?
In HTML, the ampersand character (“&”) declares the beginning of an entity reference (a special character). If you want one to appear in text on a web page you should use the encoded named entity “ & ”—more technical mumbo-jumbo at w3c.org.
For example, to encode a URL with an ampersand character, use %24. However, in HTML, use either & or &, both of which would write out the ampersand in the HTML page.
HTML and XHTML include blocks of what is called CDATA, where HTML special characters no longer have special meaning. Inside such blocks character references are no longer processed, so an ampersand must be typed as an ampersand, and not as its character reference.
No difference. UTF-8 doesn't matter because & is reserved anyway. So use &.
AFAIK bare ampersands are illegal in HTML. With that out of the way, let's look at the consequences:
&
is "clearly" an ampersand followed by a space, and ©
is clearly the copyright symbol. But what about the text fragment edit©
? The browser I 'm using right now mangles it.Since it's more difficult to detect and account for these cases manually than it is to replace all ampersands that are not part of entities (say with a regex), you should really do the latter.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With