Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to prevent JSON parser crashing when there are illigal characters in JSON?

Due to some communication errors, I am sometimes receiving JSON strings with some illegal characters: "{messageType\" : \"Test1\", \"from\" : \"F2D0B5C6-9875-46B5-8D4F\"}����1"

These illegal characters are making my JSON parser to break. I am using RapidJSON JSON parser (C/ C++). Can you please tell me if there is a way I can filter these unwanted characters from the string and also verify integrity of the json string.

like image 541
Deekshith Avatar asked Nov 06 '14 09:11

Deekshith


People also ask

What error does JSON parse () throw when the string to parse is not valid JSON?

JSON. parse() parses a string as JSON. This string has to be valid JSON and will throw this error if incorrect syntax was encountered.

Why is JSON parse failing?

The "SyntaxError: JSON. parse: unexpected character" error occurs when passing a value that is not a valid JSON string to the JSON. parse method, e.g. a native JavaScript object. To solve the error, make sure to only pass valid JSON strings to the JSON.

Is empty string valid JSON?

An empty string is a valid JSON attribute name #4253.


2 Answers

It is not a bug in the parser. The parser verifies the trailing characters before null terminator are white spaces. And it returns error code when error happens. But if there is no null terminator, it may cause segmentation fault, similar to strlen().

In the newer versions of RapidJSON, there is a kParseStopWhenDoneFlag. When it is enabled, the parser will stop reading trailing characters after a complete JSON value. E.g.

Document d;
const char* s =
    "{messageType\" : \"Test1\", \"from\" : \"F2D0B5C6-9875-46B5-8D4F\"}����1";
d.Parse<kParseStopWhenDoneFlag>(s);
assert(!d.HasParseError());

By using this flag, the parser will stop after reading }, without reporting error.

It is not yet documented in the guide. Please refer to discussion in https://github.com/miloyip/rapidjson/pull/83

like image 62
Milo Yip Avatar answered Oct 03 '22 17:10

Milo Yip


I think you should consider rolling your own pre-processing function that goes through every character in the JSON string searching for characters that are not part of your legal set and either removes or replaces them with white space. Then pass the newly repaired string forward to RapidJSON.

It's probably better to detect when you've had the comms problems in the first place (and therefore the JSON may be incomplete and or incorrect) and throw away and retry the entire session as opposed to 'patching up' the data as you want to here which solves you short term problem (program crashing) but could easily generate data inconsistencies and other more subtle and difficult to diagnose problems.

Also if you are seeing mostly bad data at the end of a string like this I think you should check carefully that your issue is actually with the comms - the case you give here looks more like a string buffer that was not terminated correctly and has additional junk (uninitialized memory) after the end of the string - perhaps you expected C++ to clear (set to zero) an allocated buffer?

like image 25
Elemental Avatar answered Oct 03 '22 18:10

Elemental