Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

cleaning JSON for XSS before deserializing

I am using Newtonsoft JSON deserializer. How can one clean JSON for XSS (cross site scripting)? Either cleaning the JSON string before de-serializing or writing some kind of custom converter/sanitizer? If so - I am not 100% sure about the best way to approach this.

Below is an example of JSON that has a dangerous script injected and needs "cleaning." I want a want to manage this before I de-serialize it. But we need to assume all kinds of XSS scenarios, including BASE64 encoded script etc, so the problem is more complex that a simple REGEX string replace.

{ "MyVar" : "hello<script>bad script code</script>world" } 

Here is a snapshot of my deserializer ( JSON -> Object ):

public T Deserialize<T>(string json)
{
    T obj;

    var JSON = cleanJSON(json); //OPTION 1 sanitize here

    var customConverter = new JSONSanitizer();// OPTION 2 create a custom converter

    obj = JsonConvert.DeserializeObject<T>(json, customConverter);

    return obj;
}

JSON is posted from a 3rd party UI interface, so it's fairly exposed, hence the server-side validation. From there, it gets serialized into all kinds of objects and is usually stored in a DB, later to be retrieved and outputted directly in HTML based UI so script injection must be mitigated.

like image 206
MarzSocks Avatar asked Sep 21 '15 15:09

MarzSocks


People also ask

Do I need to sanitize JSON?

You cannot effectively validate JSON containing multiple fields as JSON. Minimal sanitation might be feasible, but for validation and sanitation to be most effective, you should parse the JSON into an array or object and validate and sanitize each field according to what sort of data it is supposed to contain.

Is XSS possible with JSON?

XSS occurs when a user-manipulatable value is displayed on a web page without escaping it, allowing someone to inject Javascript or HTML into the page. Calls to Hash#to_json can be used to trigger XSS.

Does JSON Stringify prevent XSS?

stringify() is perhaps one of the most mundane APIs in modern browsers. The functionality to translate a JavaScript object into a string-based representation is hardly thrilling. But when the stars align, a simple JSON serialization operation can result in a significant XSS vulnerability.

What is JSON sanitization?

The sanitize method will return the input string without allocating a new buffer when the input is already valid JSON that satisfies the properties above. Thus, if used on input that is usually well formed, it has minimal memory overhead. The sanitize method takes O(n) time where n is the length in UTF-16 code-units.


2 Answers

Ok, I am going to try to keep this rather short, because this is a lot of work to write up the whole thing. But, essentially, you need to focus on the context of the data you need to sanitize. From comments on the original post, it sounds like some values in the JSON will be used as HTML that will be rendered, and this HTML comes from an un-trusted source.

The first step is to extract whichever JSON values need to be sanitized as HTML, and for each of those objects you need to run them through an HTML parser and strip away everything that is not in a whitelist. Don't forget that you will also need a whitelist for attributes.

HTML Agility Pack is a good starting place for parsing HTML in C#. How to do this part is a separate question in my opinion - and probably a duplicate of the linked question.

Your worry about base64 strings seems a little over-emphasized in my opinion. It's not like you can simply put aW5zZXJ0IGg0eCBoZXJl into an HTML document and the browser will render it. It can be abused through javascript (which your whitelist will prevent) and, to some extent, through data: urls (but this isn't THAT bad, as javascript will run in the context of the data page. Not good, but you aren't automatically gobbling up cookies with this). If you have to allow a tags, part of the process needs to be validating that the URL is http(s) (or whatever schemes you want to allow).

Ideally, you would avoid this uncomfortable situation, and instead use something like markdown - then you could simply escape the HTML string, but this is not always something we can control. You'd still have to do some URL validation though.

like image 158
Gray Avatar answered Oct 03 '22 07:10

Gray


Interesting!! Thanks for asking. we normally use html.urlencode in terms of web forms. I have a enterprise web api running that has validations like this. We have created a custom regex to validate. Please have a look at this MSDN link.

This is the sample model created to parse the request named KeyValue (say)

public class KeyValue
{
    public string Key { get; set; }
}

Step 1: Trying with a custom regex

var json = @"[{ 'MyVar' : 'hello<script>bad script code</script>world' }]";

        JArray readArray = JArray.Parse(json);
        IList<KeyValue> blogPost = readArray.Select(p => new KeyValue { Key = (string)p["MyVar"] }).ToList();

        if (!Regex.IsMatch(blogPost.ToString(),
           @"^[\p{L}\p{Zs}\p{Lu}\p{Ll}\']{1,40}$"))
            Console.WriteLine("InValid");
            //           ^ means start looking at this position.
            //           \p{ ..} matches any character in the named character class specified by {..}.
            //           {L} performs a left-to-right match.
            //           {Lu} performs a match of uppercase.
            //           {Ll} performs a match of lowercase.
            //           {Zs} matches separator and space.
            //           'matches apostrophe.
            //            {1,40} specifies the number of characters: no less than 1 and no more than 40.
            //            $ means stop looking at this position.

Step 2: Using HttpUtility.UrlEncode - this newtonsoft website link suggests the below implementation.

string json = @"[{ 'MyVar' : 'hello<script>bad script code</script>world' }]";

        JArray readArray = JArray.Parse(json);
        IList<KeyValue> blogPost = readArray.Select(p => new KeyValue {Key =HttpUtility.UrlEncode((string)p["MyVar"])}).ToList();
like image 29
staticvoidmain Avatar answered Oct 03 '22 08:10

staticvoidmain