Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

When outputting JSON content via Javascript, should I HTML escape on the server or client side?

I have an application that consists of a server-side REST API written in PHP, and some client-side Javascript that consumes this API and uses the JSON it produces to render a page. So, a pretty typical setup.

The data provided by the REST API is "untrusted", in the sense that it is fetching user-provided content from a database. So, for example, it might fetch something like:

{
    "message": "<script>alert("Gotcha!")</script>"
}

Obviously, if my client-side code were to render this directly into the page's DOM, I've created an XSS vulnerability. So, this content needs to be HTML-escaped first.

The question is, when outputting untrusted content, should I escape the content on the server side, or the client side? I.e., should my API return the raw content, and then make it the client Javascript code's responsibility to escape the special characters, or should my API return "safe" content:

{
    "message": "&lt;script&gt;alert(&#039;Gotcha!&#039;);&lt;\/script&gt;"
}

that has been already escaped?

On one hand, it seems to be that the client should not have to worry about unsafe data from my server. On the other hand, one could argue that output should always be escaped at the last minute possible, when we know exactly how the data is to be consumed.

Which approach is correct?

Note: There are plenty of questions about handling input and yes, I am aware that client-side code can always be manipulated. This question is about outputting data from my server which may not be trustable.

Update: I looked into what other people are doing, and it does seem that some REST APIs tend to send "unsafe" JSON. Gitter's API actually sends both, which is an interesting idea:

[
    {
        "id":"560ab5d0081f3a9c044d709e",
        "text":"testing the API: <script>alert('hey')</script>",
        "html":"testing the API: &lt;script&gt;alert(&#39;hey&#39;)&lt;/script&gt;",
        "sent":"2015-09-29T16:01:19.999Z",
        "fromUser":{
            ...
        },"unread":false,
        "readBy":0,
        "urls":[],
        "mentions":[],
        "issues":[],
        "meta":[],
        "v":1
    }
]

Notice that they send the raw content in the text key, and then the HTML-escaped version in the html key. Not a bad idea, IMO.

I have accepted an answer, but I don't believe this is a cut-and-dry problem. I would like to encourage further discussion on this topic.

like image 254
alexw Avatar asked Sep 28 '15 19:09

alexw


1 Answers

Escape on the client side only.

The reason to escape on the client side is security: the server's output is the client's input, and so the client should not trust it. If you assume that the input is already escaped, then you potentially open yourself to client attacks via, for example, a malicious reverse-proxy. This is not so different from why you should always validate input on the server side, even if you also include client-side validation.

The reason not to escape on the server side is separation of concerns: the server should not assume that the client intends to render the data as HTML. The server's output should be as media-neutral as possible (given the constraints of JSON and the data structure, of course), so that the client can most easily transform it into whatever format is needed.

like image 114
The Spooniest Avatar answered Oct 08 '22 10:10

The Spooniest