Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Escaping JavaScript[/CSS] between <script>[/<style>] tags: Insights on a potentially broken status quo

I'm a webdeveloper with an emphasis on server-side programming. What little I've tinkered with JavaScript, I've done with externally referenced files or event handlers, and the barest minimum of an initialising function call between <script> tags.

As such it came as a surprise to me about a week ago that the data between <script> tags is not commonly escaped. In fact... it can't be. Escaping it will throw a massive lolwut-ohnoez-wrench into the works of the JavaScript parser in, as far as I know, every browser on the face of the earth.

This leads us to the (IMO) clusterfuck that is having to use CDATA for documents with in-HTML JavaScript blocks to pass validation (in XHTML), which still breaks hilariously the moment you have ]]> in your code for any arbitrary reason.

As something of an encoding/escaping purist, I get the twitches looking at this. And for several days I've now asked myself:

Why?

Who's idea was it to excempt <script> (and, for example, quite distinctly not the JS-event handlers like onclick) from the otherwise holy rule of 'non-HTML stuff between HTML tags should be HTML escaped', and why? Is it a case of 'this just grew that way historically, it's botched now, deal with it', or did someone sit down and think up something I'm not seeing?

The same is true (though less obviously so) for CSS and the <style> tag.

Do we even know what prompted this - or is it a case of lost knowledge? My google-fu on this topic has been incredibly weak, and I've not found anything, but since this is actually bugging me in pathetically OCD ways, I'd love to hear explanations if anyone has any.

like image 973
pinkgothic Avatar asked Sep 15 '10 18:09

pinkgothic


2 Answers

Because it is very common to want to use characters such as & and < in scripts, and escaping them is a pain.

On the flip side, <script> and <style> can't have child elements, so there is no need to make it easy to include a tag.

The result - HTML defines <script> and <style> as containing CDATA in the DTD, so you don't need to do it manually in the document, thus making life easier.

XHTML is different. In many ways XML is simpler then SGML, and its DTDs don't (as far as I know) have that facility. Hence, you need to be explicit about CDATA markers (or use entities) in XHTML. The only reason it is a "clusterfuck" is because people claim their XHTML is HTML by serving it with a text/html content-type (instead of the correct application/xhtml+xml).

As for intrinsic event attributes, SGML doesn't make it possible to say that special characters should not be treated as such, but when they are used they shouldn't contain much more than a function call … and are better avoided in favour of unobtrusive JS anyway.

like image 101
Quentin Avatar answered Nov 08 '22 15:11

Quentin


Because in Javascript you are constantly using characters that would need to be escaped in HTML. That is the point of having CDATA after all isn't it?

Tell me what you think looks more reasonable

if (5 &gt; 4 &amp;&amp; 2 &lt; 3) alert('dude');

Or

if (5 > 4 && 2 < 3) alert('dude');

Also in the vast majority of cases, both CSS and Javascript should be included as links to separate files, rather than inlined in HTML, thus avoiding the escaping issue entirely.

like image 39
MooGoo Avatar answered Nov 08 '22 17:11

MooGoo