Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Escaping HTML entities in JavaScript string literals within the <script> block

On the one hand if I have

<script> var s = 'Hello </script>'; console.log(s); </script> 

the browser will terminate the <script> block early and basically I get the page screwed up.

On the other hand, the value of the string may come from a user (say, via a previously submitted form, and now the string ends up being inserted into a <script> block as a literal), so you can expect anything in that string, including maliciously formed tags. Now, if I escape the string literal with htmlentities() when generating the page, the value of s will contain the escaped entities literally, i.e. s will output

Hello &lt;/script&gt; 

which is not desired behavior in this case.

One way of properly escaping JS strings within a <script> block is escaping the slash if it follows the left angle bracket, or just always escaping the slash, i.e.

var s = 'Hello <\/script>'; 

This seems to be working fine.

Then comes the question of JS code within HTML event handlers, which can be easily broken too, e.g.

<div onClick="alert('Hello ">')"></div> 

looks valid at first but breaks in most (or all?) browsers. This, obviously requires the full HTML entity encoding.

My question is: what is the best/standard practice for properly covering all the situations above - i.e. JS within a script block, JS within event handlers - if your JS code can partly be generated on the server side and can potentially contain malicious data?

like image 316
mojuba Avatar asked Jan 05 '12 20:01

mojuba


People also ask

How do you escape a string literal in JavaScript?

To escape a backtick in a template literal, put a backslash ( \ ) before the backtick.

How do you escape a string literal?

String literal syntaxUse the escape sequence \n to represent a new-line character as part of the string. Use the escape sequence \\ to represent a backslash character as part of the string. You can represent a single quotation mark symbol either by itself or with the escape sequence \' .


1 Answers

The following characters could interfere with an HTML or Javascript parser and should be escaped in string literals: <, >, ", ', \, and &.

In a script block using the escape character, as you found out, works. The concatenation method (</scr' + 'ipt>') can be hard to read.

var s = 'Hello <\/script>'; 

For inline Javascript in HTML, you can use entities:

<div onClick="alert('Hello &quot;>')">click me</div> 

Demo: http://jsfiddle.net/ThinkingStiff/67RZH/

The method that works in both <script> blocks and inline Javascript is \uxxxx, where xxxx is the hexadecimal character code.

  • < - \u003c
  • > - \u003e
  • " - \u0022
  • ' - \u0027
  • \ - \u005c
  • & - \u0026

Demo: http://jsfiddle.net/ThinkingStiff/Vz8n7/

HTML:

<div onClick="alert('Hello \u0022>')">click me</div>  <script>     var s = 'Hello \u003c/script\u003e'; alert( s ); </script>    
like image 97
ThinkingStiff Avatar answered Sep 26 '22 23:09

ThinkingStiff