Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can I use unencoded ampersands (&) in html? [duplicate]

I'm building a website where I have to work with less then perfect masterdata (I guess I'm not the only one :-))

In my case I have to render an xml filte to html (using xsl). Sometimes the masterdata is using html-enitites allready (eg ;é in french words) so there I have to use 'disable-output-escaping='yes') there in order to avoid double encoding.

The easiest solution is disable output escaping all together, so I never run the risk of a double encoding.

The only characters that misses encoding for this masterdata are the ampersands. But when I parse them 'raw' (so rather & than & all browsers seem to be ok with it.

So the question : what are the consequenses of using not encoded ampersands in html?

like image 965
Peter Avatar asked Jun 27 '12 07:06

Peter


People also ask

How do you use ampersands in HTML?

In HTML, the ampersand character (“&”) declares the beginning of an entity reference (a special character). If you want one to appear in text on a web page you should use the encoded named entity “ & ”—more technical mumbo-jumbo at w3c.org.

Can you have ampersand in URL?

For example, to encode a URL with an ampersand character, use %24. However, in HTML, use either & or &, both of which would write out the ampersand in the HTML page.

Is ampersand valid in HTML?

HTML and XHTML include blocks of what is called CDATA, where HTML special characters no longer have special meaning. Inside such blocks character references are no longer processed, so an ampersand must be typed as an ampersand, and not as its character reference.

Is ampersand a UTF 8 character?

No difference. UTF-8 doesn't matter because & is reserved anyway. So use &.


1 Answers

AFAIK bare ampersands are illegal in HTML. With that out of the way, let's look at the consequences:

  • You are now relying on the browser's capabilities to detect and gracefully recover from the problem. Note that in order to do this, the browser has to guess: is "clearly" an ampersand followed by a space, and © is clearly the copyright symbol. But what about the text fragment edit&copy? The browser I 'm using right now mangles it.
  • If you are using XHTML, or if the content is ever going to be inserted into an XML document, the result will be a hard parser error.

Since it's more difficult to detect and account for these cases manually than it is to replace all ampersands that are not part of entities (say with a regex), you should really do the latter.

like image 83
Jon Avatar answered Sep 19 '22 05:09

Jon