Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python - Converting a string with escape characters to json

JSON objects are printed into my syslog file. I need to extract the string from the log and convert it into JSON. I don't have any problems extracting the string between '{' and '}', but certain strings have an escape character in them, and this is causing json.loads to fail

Here is the problem:

>>> import json
>>> resp = '{"from_hostname": {"value": "mysite.edu\"", "value2": 0, "value3": 1}}'
>>> json.loads(resp)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 338, in loads
    return _default_decoder.decode(s)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 365, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 381, in raw_decode
    obj, end = self.scan_once(s, idx)
ValueError: Expecting , delimiter: line 1 column 41 (char 40)
>>> resp[40]
'"'
>>> resp[41]
','
>>> resp[39]
'"'
>>>

When json sees the \" I guess it only sees " and assumes that the string is over, and it throws the delimiter error.

I tried replacing \" with \\" but that doesn't seem to work.

NOTE: The \" can occur at the beginning or end or in the middle of the string.

How do I get this working?

like image 809
gixxer Avatar asked Jan 19 '16 22:01

gixxer


2 Answers

if \" can occur in your string you have to escape \ and " with

import json
resp = '{"from_hostname": {"value": "mysite.edu\\\"", "value2": 0, "value3": 1}}'
print(json.loads(resp))

it prints

{u'from_hostname': {u'value3': 1, u'value2': 0, u'value': u'mysite.edu"'}}

is this the right interpration of your question?

like image 71
D-E-N Avatar answered Sep 22 '22 18:09

D-E-N


The problem is that the backslash character is escaping the double quote in the Python string, but it is not actually present in the string. Printing the string demonstrates this:

>>> print '{"from_hostname": {"value": "mysite.edu\"", "value2": 0, "value3": 1}}'
'{"from_hostname": {"value": "mysite.edu"", "value2": 0, "value3": 1}}'

This shows that the backslash is not in the string. So the double quote must be escaped for the string to be a valid JSON string, which means that the backslash must be present in the string. You can do that by escaping the backslash itself with another backslash, i.e. \\:

>>> print '{"from_hostname": {"value": "mysite.edu\\"", "value2": 0, "value3": 1}}'
{"from_hostname": {"value": "mysite.edu\"", "value2": 0, "value3": 1}}

and json.loads() now works:

>>> json.loads('{"from_hostname": {"value": "mysite.edu\\"", "value2": 0, "value3": 1}}')
{u'from_hostname': {u'value3': 1, u'value2': 0, u'value': u'mysite.edu"'}}

Or you could use a raw string:

>>> json.loads(r'{"from_hostname": {"value": "mysite.edu\"", "value2": 0, "value3": 1}}')
{u'from_hostname': {u'value3': 1, u'value2': 0, u'value': u'mysite.edu"'}}

However, json.loads() fails on JSON strings that you extracted from the log file which strongly suggests that the problem is there. You should post the extraction code in your question so that can be checked.

like image 22
mhawke Avatar answered Sep 21 '22 18:09

mhawke