My goal is to convert JSON file into a format that can uploaded from Cloud Storage into BigQuery (as described here) with Python.
I have tried using newlineJSON package for the conversion but receives the following error.
JSONDecodeError: Expecting value or ']': line 2 column 1 (char 5)
Does anyone have the solution to this?
Here is the sample JSON code:
[{
"key01": "value01",
"key02": "value02",
...
"keyN": "valueN"
},
{
"key01": "value01",
"key02": "value02",
...
"keyN": "valueN"
},
{
"key01": "value01",
"key02": "value02",
...
"keyN": "valueN"
}
]
And here's the existing python script:
with nlj.open(url_samplejson, json_lib = "simplejson") as src_:
with nlj.open(url_convertedjson, "w") as dst_:
for line_ in src_:
dst_.write(line_)
JSON strings do not allow real newlines in its data; it can only have escaped newlines.
In JSON object make sure that you are having a sentence where you need to print in different lines. Now in-order to print the statements in different lines we need to use '\\n' (backward slash). As we now know the technique to print in newlines, now just add '\\n' wherever you want.
dumps() json. dumps() function converts a Python object into a json string. skipkeys:If skipkeys is true (default: False), then dict keys that are not of a basic type (str, int, float, bool, None) will be skipped instead of raising a TypeError.
Try to use str() and json. dumps() when converting JSON to string in python. It is not necessary to change the output string to json (dict) again for me.
The answer with jq
is really useful, but if you still want to do it with Python (as it seems from the question), you can do it with built-in json
module.
import json
from io import StringIO
in_json = StringIO("""[{
"key01": "value01",
"key02": "value02",
"keyN": "valueN"
},
{
"key01": "value01",
"key02": "value02",
"keyN": "valueN"
},
{
"key01": "value01",
"key02": "value02",
"keyN": "valueN"
}
]""")
result = [json.dumps(record) for record in json.load(in_json)] # the only significant line to convert the JSON to the desired format
print('\n'.join(result))
{"key01": "value01", "key02": "value02", "keyN": "valueN"}
{"key01": "value01", "key02": "value02", "keyN": "valueN"}
{"key01": "value01", "key02": "value02", "keyN": "valueN"}
* I'm using StringIO
and print
here just to make a sample easier to test locally.
As an alternative, you can use Python jq binding to combine it with the other answer.
If you are willing to get out of Python, use jq
:
$ cat a.json
[{
"key01": "value01",
"key02": "value02",
"keyN": "valueN"
},
{
"key01": "value01",
"key02": "value02",
"keyN": "valueN"
},
{
"key01": "value01",
"key02": "value02",
"keyN": "valueN"
}
]
$ cat a.json | jq -c '.[]'
{"key01":"value01","key02":"value02","keyN":"valueN"}
{"key01":"value01","key02":"value02","keyN":"valueN"}
{"key01":"value01","key02":"value02","keyN":"valueN"}
The iterator I used is '.[]'
to go through the array, and -c
puts each JSON object on a single line.
Resources:
This takes a JSON file and converts into ND-JSON file.
import json
with open("results-20190312-113458.json", "r") as read_file:
data = json.load(read_file)
result = [json.dumps(record) for record in data]
with open('nd-proceesed.json', 'w') as obj:
for i in result:
obj.write(i+'\n')
Hope this helps someone.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With