Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I escape closing '/' in HTML tags in JSON with Python?

Note: This question is very close to Embedding JSON objects in script tags, but the responses to that question provides what I already know (that in JSON / == \/). I want to know how to do that escaping.

The HTML spec prohibits closed HTML tags anywhere within a <script> element. So, this causes parse errors:

<script>
var assets = [{
  "asset_created": null, 
  "asset_id": "575155948f7d4c4ebccb02d4e8f84d2f", 
  "body": "<script></script>"
}];
</script>

In my case, I'm generating the invalid situation by rendering a JSON string inside a Django template, i.e.:

<script>
var assets = {{ json_string }};
</script>

I know that JSON parses \/ the same as /, so if I can just escape my closing HTML tags in the JSON string, I'll be good. But, I'm not sure of the best way to do this.

My naive approach would just be this:

json_string = '[{"asset_created": null, "asset_id": "575155948f7d4c4ebccb02d4e8f84d2f", "body": "<script></script>"}]'
escaped_json_string = json_string.replace('</', r'<\/')

Is there a better way? Or any gotchas that I'm overlooking?

like image 397
Geoffrey Hing Avatar asked Mar 08 '13 15:03

Geoffrey Hing


People also ask

How escape HTML tag JSON?

However, if your data contains HTML, there are certain things that you need to do to keep the browser happy when using your JSON data within Javascript. Escape the forward slash in HTML end tags. <div>Hello World!

How do I escape a string in JSON?

The only difference between Java strings and Json strings is that in Json, forward-slash (/) is escaped.

How add JSON link in HTML?

Just add the backslash ( \ ) symbol before the beginning of a double quote and the before the end of a double quote. I like to tweet about JSON and post helpful code snippets. Follow me there if you would like some too!


1 Answers

Updated Answer

Okay I assumed a few things incorrectly. For escaping the JSON, the simplejson library has a method JSONEncoderForHTML than can be used. You may need to install it via pip or easy_install if the code doesn't work. Then you can do something like this:

import simplejson
asset_json=simplejson.loads(json_string)
encoded=simplejson.encoder.JSONEncoderForHTML().encode(assets_json)

which encoded will give you this:

'{"asset_id": "575155948f7d4c4ebccb02d4e8f84d2f", "body": "\\u003cscript\\u003e\\u003c/script\\u003e", "asset_created": null}'

This is a more overall solution than the slash replace as it handles other encoding caveats as well.

The loads part is a side-effect of having the JSON already encoded. This can be avoided by not using DJango if possible to generate the JSON and instead using simplejson:

simplejson.dumps(your_object_to_encode, cls=simplejson.encoder.JSONEncoderForHTML)

Old Answer

Try wrapping your script in CDATA:

<script>
//<![CDATA[
var assets = [{
  "asset_created": null, 
  "asset_id": "575155948f7d4c4ebccb02d4e8f84d2f", 
  "body": "<script></script>"
}];
//]]>
</script>

It's meant to flag the parser on this sort of thing. Otherwise you'll need to use the character escapes that have been mentioned.

like image 110
cwgem Avatar answered Oct 04 '22 09:10

cwgem