Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Rails, why '.to_json' is escaping html entities

This is what happens in a rails console (Rails v4.0.4):

irb(main):020:0> "pepe&pepe <juan>".to_json
=> "\"pepe\\u0026pepe \\u003Cjuan\\u003E\""

This is what happens in a irb console (Ruby 2.0.0p247):

irb(main):014:0> "pepe&pepe <juan>".to_json
=> "\"pepe&pepe <juan>\""

I know I can override this behaviour but my concern is why Rails is doing this by default? which can be the consequences of not doing it?, because for me it looks like a good idea to override this behaviour and not escaping html entities but I'm sure I'm missing something.

like image 476
fguillen Avatar asked Nov 15 '25 12:11

fguillen


1 Answers

JSON is written to HTML contexts - scripts and attributes - a lot in Rails.

This default escaping avoids injection in such cases: characters that have meaning in a particular context and are not escaped pose an injection / XSS risk.1

If, and only if, dealing in a context where such is not the case then it can be safely disabled: the default is simply to favor 'safety'. Since this HTML-safe transformation can be done without breaking any standard and without chaining the JSON-equivalency2 it is what the Rail's ream has done - good for them!3

In particular this avoids nasty 'JSON'2 like:

var x = {"foo": "</script><script>alert('owned')</script>"};

JSON embedded into other HTML constructs, eg. data-attributes, can also be problematic. Even using JSON.parse, which would require an extra encoding step, leaves the same potential issue.


1 The standard safe-encoded output methods apply to HTML PCDATA contexts, but in the case of emitting JSON to a script element (CDATA) this is not desirable and purposefully skipped (eg. with raw).

2 Here is another answer of mine where I wrote about about why such escaping is always valid as well as a caveat of using JSON as a JavaScript Literal. Unlike the notorious and ill-devised 'add slashes', the HTML-safe JSON represents identical information.

3 JavaScriptSerializer from Microsoft and json_encode in PHP has similar default encoding behavior. The default context in which these libraries/functions are used probably plays a large factor on the default HTML-safe configurations.

like image 99
user2864740 Avatar answered Nov 17 '25 01:11

user2864740