Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is json.loads() vulnerable to arbitrary code execution?

Is json.loads from Python's standard json module vulnerable to arbitrary code execution or any other security problems?

My application can receive JSON messages from non-trustworthy sources.

like image 772
FrozenHeart Avatar asked Aug 07 '16 10:08

FrozenHeart


People also ask

Is JSON a security risk?

JSON alone is not much of a threat. After all, it's only a data-interchange format. The real security concerns with JSON arise in the way that it is used. If misused, JSON-based applications can become vulnerable to attacks such as JSON hijacking and JSON injection.

What does JSON loads () return?

load or json. loads() method, it returns a Python dictionary. If you want to convert JSON into a custom Python object then we can write a custom JSON decoder and pass it to the json. loads() method so we can get a custom Class object instead of a dictionary.

What is JSON load used for?

loads() json. loads() method can be used to parse a valid JSON string and convert it into a Python Dictionary. It is mainly used for deserializing native string, byte, or byte array which consists of JSON data into Python Dictionary.

Is JSON parse secure?

Parsing JSON can be a dangerous procedure if the JSON text contains untrusted data. For example, if you parse untrusted JSON in a browser using the JavaScript “eval” function, and the untrusted JSON text itself contains JavaScript code, the code will execute during parse time.


1 Answers

Note that the below answer is relating to the default Python3.4 installation for Windows 10 64-bit. Also note that this answer only looks at the py scanner, not the c scanner.

For the source files see https://hg.python.org/cpython/file/tip/Lib/json or find them in your local python installation.

Research

See the reference implementation at the bottom of this post alongside this research

The parsing functions called by json.loads(s) are defined in \Lib\json\scanner.py:

parse_object = context.parse_object parse_array = context.parse_array parse_string = context.parse_string parse_float = context.parse_float parse_int = context.parse_int parse_constant = context.parse_constant 

with context being an instance of the JSONDecoder class which is defined in \Lib\json\decoder.py and uses the following parsers:

self.parse_float = parse_float or float self.parse_int = parse_int or int self.parse_constant = parse_constant or _CONSTANTS.__getitem__ self.parse_string = scanstring self.parse_object = JSONObject self.parse_array = JSONArray 

From here we can look at each individual parser to determine whether or not it is susceptible to arbitrary code execution:


parse_float

This uses the default float function and so is safe.


parse_int

This uses the default int function and so is safe.


parse_constant

_CONSTANTS is defined within the same file as:

_CONSTANTS = {     '-Infinity': NegInf,     'Infinity': PosInf,     'NaN': NaN, } 

and so a simple lookup is being performed, and so it is safe.


parse_string, JSONObject, JSONArray

As can be seen by looking at the implementations at the end of this post, the only external code that could be executed is:

From JSONObject:

  • object_pairs_hook
  • object_hook

From JSONArray:

  • scan_once


object_pairs_hook, object_hook

By default object_pairs_hook and object_hook are defined as None from the decoder initializer:

def __init__(self, object_hook=None, parse_float=None,         parse_int=None, parse_constant=None, strict=True,         object_pairs_hook=None) 


scan_once

scan_once is defined as:

self.scan_once = scanner.make_scanner(self) 

The source for which can be found in \Lib\json\scanner.py, from which we can see that it scan_once simply calls the appropriate parser for each part of the JSON object.


Conclusion

From the above and the reference implementation it can be seen that as long as the scanner used by the JSON decoder is the default, arbitrary code will not be executed, it is probably possible by using a custom decoder through the use of its __init__ parameters to instead make it execute arbitrary code, but save that I don't think so.


Implementation

BACKSLASH

BACKSLASH = {     '"': '"', '\\': '\\', '/': '/',     'b': '\b', 'f': '\f', 'n': '\n', 'r': '\r', 't': '\t', } 

STRINGCHUNK

STRINGCHUNK = re.compile(r'(.*?)(["\\\x00-\x1f])', FLAGS) 

scanstring

def py_scanstring(s, end, strict=True,         _b=BACKSLASH, _m=STRINGCHUNK.match):     """Scan the string s for a JSON string. End is the index of the     character in s after the quote that started the JSON string.     Unescapes all valid JSON string escape sequences and raises ValueError     on attempt to decode an invalid string. If strict is False then literal     control characters are allowed in the string.      Returns a tuple of the decoded string and the index of the character in s     after the end quote."""     chunks = []     _append = chunks.append     begin = end - 1     while 1:         chunk = _m(s, end)         if chunk is None:             raise ValueError(                 errmsg("Unterminated string starting at", s, begin))         end = chunk.end()         content, terminator = chunk.groups()         # Content is contains zero or more unescaped string characters         if content:             _append(content)         # Terminator is the end of string, a literal control character,         # or a backslash denoting that an escape sequence follows         if terminator == '"':             break         elif terminator != '\\':             if strict:                 #msg = "Invalid control character %r at" % (terminator,)                 msg = "Invalid control character {0!r} at".format(terminator)                 raise ValueError(errmsg(msg, s, end))             else:                 _append(terminator)                 continue         try:             esc = s[end]         except IndexError:             raise ValueError(                 errmsg("Unterminated string starting at", s, begin))         # If not a unicode escape sequence, must be in the lookup table         if esc != 'u':             try:                 char = _b[esc]             except KeyError:                 msg = "Invalid \\escape: {0!r}".format(esc)                 raise ValueError(errmsg(msg, s, end))             end += 1         else:             uni = _decode_uXXXX(s, end)             end += 5             if 0xd800 <= uni <= 0xdbff and s[end:end + 2] == '\\u':                 uni2 = _decode_uXXXX(s, end + 1)                 if 0xdc00 <= uni2 <= 0xdfff:                     uni = 0x10000 + (((uni - 0xd800) << 10) | (uni2 - 0xdc00))                     end += 6             char = chr(uni)         _append(char)     return ''.join(chunks), end  scanstring = c_scanstring or py_scanstring 

WHITESPACE

WHITESPACE = re.compile(r'[ \t\n\r]*', FLAGS) 

WHITESPACE_STR

WHITESPACE_STR = ' \t\n\r' 

JSONObject

def JSONObject(s_and_end, strict, scan_once, object_hook, object_pairs_hook,                memo=None, _w=WHITESPACE.match, _ws=WHITESPACE_STR):     s, end = s_and_end     pairs = []     pairs_append = pairs.append     # Backwards compatibility     if memo is None:         memo = {}     memo_get = memo.setdefault     # Use a slice to prevent IndexError from being raised, the following     # check will raise a more specific ValueError if the string is empty     nextchar = s[end:end + 1]     # Normally we expect nextchar == '"'     if nextchar != '"':         if nextchar in _ws:             end = _w(s, end).end()             nextchar = s[end:end + 1]         # Trivial empty object         if nextchar == '}':             if object_pairs_hook is not None:                 result = object_pairs_hook(pairs)                 return result, end + 1             pairs = {}             if object_hook is not None:                 pairs = object_hook(pairs)             return pairs, end + 1         elif nextchar != '"':             raise ValueError(errmsg(                 "Expecting property name enclosed in double quotes", s, end))     end += 1     while True:         key, end = scanstring(s, end, strict)         key = memo_get(key, key)         # To skip some function call overhead we optimize the fast paths where         # the JSON key separator is ": " or just ":".         if s[end:end + 1] != ':':             end = _w(s, end).end()             if s[end:end + 1] != ':':                 raise ValueError(errmsg("Expecting ':' delimiter", s, end))         end += 1          try:             if s[end] in _ws:                 end += 1                 if s[end] in _ws:                     end = _w(s, end + 1).end()         except IndexError:             pass          try:             value, end = scan_once(s, end)         except StopIteration as err:             raise ValueError(errmsg("Expecting value", s, err.value)) from None         pairs_append((key, value))         try:             nextchar = s[end]             if nextchar in _ws:                 end = _w(s, end + 1).end()                 nextchar = s[end]         except IndexError:             nextchar = ''         end += 1          if nextchar == '}':             break         elif nextchar != ',':             raise ValueError(errmsg("Expecting ',' delimiter", s, end - 1))         end = _w(s, end).end()         nextchar = s[end:end + 1]         end += 1         if nextchar != '"':             raise ValueError(errmsg(                 "Expecting property name enclosed in double quotes", s, end - 1))     if object_pairs_hook is not None:         result = object_pairs_hook(pairs)         return result, end     pairs = dict(pairs)     if object_hook is not None:         pairs = object_hook(pairs)     return pairs, end 

JSONArray

def JSONArray(s_and_end, scan_once, _w=WHITESPACE.match, _ws=WHITESPACE_STR):     s, end = s_and_end     values = []     nextchar = s[end:end + 1]     if nextchar in _ws:         end = _w(s, end + 1).end()         nextchar = s[end:end + 1]     # Look-ahead for trivial empty array     if nextchar == ']':         return values, end + 1     _append = values.append     while True:         try:             value, end = scan_once(s, end)         except StopIteration as err:             raise ValueError(errmsg("Expecting value", s, err.value)) from None         _append(value)         nextchar = s[end:end + 1]         if nextchar in _ws:             end = _w(s, end + 1).end()             nextchar = s[end:end + 1]         end += 1         if nextchar == ']':             break         elif nextchar != ',':             raise ValueError(errmsg("Expecting ',' delimiter", s, end - 1))         try:             if s[end] in _ws:                 end += 1                 if s[end] in _ws:                     end = _w(s, end + 1).end()         except IndexError:             pass      return values, end 

scanner.make_scanner

def py_make_scanner(context):     parse_object = context.parse_object     parse_array = context.parse_array     parse_string = context.parse_string     match_number = NUMBER_RE.match     strict = context.strict     parse_float = context.parse_float     parse_int = context.parse_int     parse_constant = context.parse_constant     object_hook = context.object_hook     object_pairs_hook = context.object_pairs_hook     memo = context.memo      def _scan_once(string, idx):         try:             nextchar = string[idx]         except IndexError:             raise StopIteration(idx)          if nextchar == '"':             return parse_string(string, idx + 1, strict)         elif nextchar == '{':             return parse_object((string, idx + 1), strict,                 _scan_once, object_hook, object_pairs_hook, memo)         elif nextchar == '[':             return parse_array((string, idx + 1), _scan_once)         elif nextchar == 'n' and string[idx:idx + 4] == 'null':             return None, idx + 4         elif nextchar == 't' and string[idx:idx + 4] == 'true':             return True, idx + 4         elif nextchar == 'f' and string[idx:idx + 5] == 'false':             return False, idx + 5          m = match_number(string, idx)         if m is not None:             integer, frac, exp = m.groups()             if frac or exp:                 res = parse_float(integer + (frac or '') + (exp or ''))             else:                 res = parse_int(integer)             return res, m.end()         elif nextchar == 'N' and string[idx:idx + 3] == 'NaN':             return parse_constant('NaN'), idx + 3         elif nextchar == 'I' and string[idx:idx + 8] == 'Infinity':             return parse_constant('Infinity'), idx + 8         elif nextchar == '-' and string[idx:idx + 9] == '-Infinity':             return parse_constant('-Infinity'), idx + 9         else:             raise StopIteration(idx)      def scan_once(string, idx):         try:             return _scan_once(string, idx)         finally:             memo.clear()      return _scan_once  make_scanner = c_make_scanner or py_make_scanner 
like image 119
Nick stands with Ukraine Avatar answered Oct 24 '22 05:10

Nick stands with Ukraine