Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to insert arbitrary JSON in HTML's script tag

I would like to store a JSON's contents in a HTML document's source, inside a script tag.

The content of that JSON does depend on user submitted input, thus great care is needed to sanitise that string for XSS.

I've read two concept here on SO.

1. Replace all occurrences of the </script tag into <\/script, or replace all </ into <\/ server side.

Code wise it looks like the following (using Python and jinja2 for the example):

// view
data = {
    'test': 'asdas</script><b>as\'da</b><b>as"da</b>',
}

context_dict = {
    'data_json': json.dumps(data, ensure_ascii=False).replace('</script', r'<\/script'),
}

// template
<script>
    var data_json = {{ data_json | safe }};
</script>

// js
access it simply as window.data_json object

2. Encode the data as a HTML entity encoded JSON string, and unescape + parse it in client side. Unescape is from this answer: https://stackoverflow.com/a/34064434/518169

// view
context_dict = {
    'data_json': json.dumps(data, ensure_ascii=False),
}

// template
<script>
    var data_json = '{{ data_json }}'; // encoded into HTML entities, like &lt; &gt; &amp;
</script>

// js
function htmlDecode(input) {
  var doc = new DOMParser().parseFromString(input, "text/html");
  return doc.documentElement.textContent;
}

var decoded = htmlDecode(window.data_json);
var data_json = JSON.parse(decoded);

This method doesn't work because \" in a script source becames " in a JS variable. Also, it creates a much bigger HTML document and also is not really human readable, so I'd go with the first one if it doesn't mean a huge security risk.

Is there any security risk in using the first version? Is it enough to sanitise a JSON encoded string with .replace('</script', r'<\/script')?

Reference on SO:
Best way to store JSON in an HTML attribute?
Why split the <script> tag when writing it with document.write()?
Script tag in JavaScript string
Sanitize <script> element contents
Escape </ in script tag contents

Some great external resources about this issue:
Flask's tojson filter's implementation source
Rail's json_escape method's help and source
A 5 year long discussion in Django ticket and proposed code

like image 600
hyperknot Avatar asked Aug 28 '16 16:08

hyperknot


People also ask

Can JSON be used in HTML?

JSON can very easily be translated into JavaScript. JavaScript can be used to make HTML in your web pages.

What is JSON () method in JavaScript?

The JSON object contains methods for parsing JavaScript Object Notation (JSON) and converting values to JSON. It can't be called or constructed.

How do I read a JSON file in HTML?

The jQuery code uses getJSON() method to fetch the data from the file's location using an AJAX HTTP GET request. It takes two arguments. One is the location of the JSON file and the other is the function containing the JSON data. The each() function is used to iterate through all the objects in the array.


1 Answers

Here's how I dealt with the relatively minor part of this issue, the encoding problem with storing JSON in a script element. The short answer is you have to escape either < or / as together they terminate the script element -- even inside a JSON string literal. You can't HTML-encode entities for a script element. You could JavaScript-backslash-escape the slash. I preferred to JavaScript-hex-escape the less-than angle-bracket as \u003C.

.replace('<', r'\u003C')

I ran into this problem trying to pass the json from oembed results. Some of them contain script close tags (without mentioning Twitter by name).

json_for_script = json.dumps(data).replace('<', r'\u003C');

This turns data = {'test': 'foo </script> bar'}; into

'{"test": "foo \\u003C/script> bar"}'

which is valid JSON that won't terminate a script element.

I got the idea from this little gem inside the Jinja template engine. It's what's run when you use the {{data|tojson}} filter.

def htmlsafe_json_dumps(obj, dumper=None, **kwargs):
    """Works exactly like :func:`dumps` but is safe for use in ``<script>``
    tags.  It accepts the same arguments and returns a JSON string.  Note that
    this is available in templates through the ``|tojson`` filter which will
    also mark the result as safe.  Due to how this function escapes certain
    characters this is safe even if used outside of ``<script>`` tags.
    The following characters are escaped in strings:
    -   ``<``
    -   ``>``
    -   ``&``
    -   ``'``
    This makes it safe to embed such strings in any place in HTML with the
    notable exception of double quoted attributes.  In that case single
    quote your attributes or HTML escape it in addition.
    """
    if dumper is None:
        dumper = json.dumps
    rv = dumper(obj, **kwargs) \
        .replace(u'<', u'\\u003c') \
        .replace(u'>', u'\\u003e') \
        .replace(u'&', u'\\u0026') \
        .replace(u"'", u'\\u0027')
    return Markup(rv)

(You could use \x3C instead of \xu003C and that would work in a script element because it's valid JavaScript. But might as well stick to valid JSON.)

like image 162
Bob Stein Avatar answered Sep 27 '22 21:09

Bob Stein