Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unable to parse TAB in JSON files

Tags:

python

json

I am running into a parsing problem when loading JSON files that seem to have the TAB character in them.

When I go to http://jsonlint.com/, and I enter the part with the TAB character:

{     "My_String": "Foo bar.  Bar foo." } 

The validator complains with:

Parse error on line 2: {    "My_String": "Foo bar. Bar foo." ------------------^ Expecting 'STRING', 'NUMBER', 'NULL', 'TRUE', 'FALSE', '{', '[' 

This is literally a copy/paste of the offending JSON text.

I have tried loading this file with json and simplejson without success. How can I load this properly? Should I just pre-process the file and replace TAB by \t or by a space? Or is there anything that I am missing here?

Update:

Here is also a problematic example in simplejson:

foo = '{"My_string": "Foo bar.\t Bar foo."}' simplejson.loads(foo)  JSONDecodeError: Invalid control character '\t' at: line 1 column 24 (char 23) 
like image 312
Josh Avatar asked Nov 05 '13 21:11

Josh


People also ask

Are tabs allowed in JSON?

JSON, by its official spec, does not allow actual tabs in it. Just add in a find-and-replace node that replaces all actual tabs with "\t", or find the source of the JSON and change it to comply with the official spec.

Why is JSON parse failing?

parse: unexpected character" error occurs when passing a value that is not a valid JSON string to the JSON. parse method, e.g. a native JavaScript object. To solve the error, make sure to only pass valid JSON strings to the JSON.

Is JSON tab sensitive?

Some tips working with JSON: Whitespace (Space, Horizontal tab, Line feed or New line or Carriage return) does not matter in JSON. It can also be minified with no affect to the data.

How do I parse JSON?

Use the JavaScript function JSON. parse() to convert text into a JavaScript object: const obj = JSON. parse('{"name":"John", "age":30, "city":"New York"}');


1 Answers

From JSON standard:

Insignificant whitespace is allowed before or after any token. The whitespace characters are: character tabulation (U+0009), line feed (U+000A), carriage return (U+000D), and space (U+0020). Whitespace is not allowed within any token, except that space is allowed in strings.

It means that a literal tab character is not allowed inside a JSON string. You need to escape it as \t (in a .json-file):

{"My_string": "Foo bar.\t Bar foo."} 

In addition if json text is provided inside a Python string literal then you need double escape the tab:

foo = '{"My_string": "Foo bar.\\t Bar foo."}' # in a Python source 

Or use a Python raw string literal:

foo = r'{"My_string": "Foo bar.\t Bar foo."}' # in a Python source 
like image 153
jfs Avatar answered Sep 21 '22 04:09

jfs