I am retrieving Twitter data with a Python tool and dump these in JSON format to my disk. I noticed an unintended escaping of the entire data-string for a tweet being enclosed in double quotes. Furthermore, all double quotes of the actual JSON formatting are escaped with a backslash.
They look like this:
"{\"created_at\":\"Fri Aug 08 11:04:40 +0000 2014\",\"id\":497699913925292032,
How do I avoid that? It should be:
{"created_at":"Fri Aug 08 11:04:40 +0000 2014" .....
My file-out code looks like this:
with io.open('data'+self.timestamp+'.txt', 'a', encoding='utf-8') as f: f.write(unicode(json.dumps(data, ensure_ascii=False))) f.write(unicode('\n'))
The unintended escaping causes problems when reading in the JSON file in a later processing step.
if you want to escape double quote in JSON use \\ to escape it.
JSON names require double quotes.
Using the strip() Function to Remove Double Quotes from String in Python. We use the strip() function in Python to delete characters from the start or end of the string. We can use this method to remove the quotes if they exist at the start or end of the string.
json. dump() method used to write Python serialized object as JSON formatted data into a file. json. dumps() method is used to encodes any Python object into JSON formatted String.
You are double encoding your JSON strings. data
is already a JSON string, and doesn't need to be encoded again:
>>> import json >>> not_encoded = {"created_at":"Fri Aug 08 11:04:40 +0000 2014"} >>> encoded_data = json.dumps(not_encoded) >>> print encoded_data {"created_at": "Fri Aug 08 11:04:40 +0000 2014"} >>> double_encode = json.dumps(encoded_data) >>> print double_encode "{\"created_at\": \"Fri Aug 08 11:04:40 +0000 2014\"}"
Just write these directly to your file:
with open('data{}.txt'.format(self.timestamp), 'a') as f: f.write(data + '\n')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With