I wish to manipulate a standard JSON object to an object where each line must contain a separate, self-contained valid JSON object. See JSON Lines
JSON_file = [{u'index': 1, u'no': 'A', u'met': u'1043205'}, {u'index': 2, u'no': 'B', u'met': u'000031043206'}, {u'index': 3, u'no': 'C', u'met': u'0031043207'}]
To JSONL
:
{u'index': 1, u'no': 'A', u'met': u'1043205'} {u'index': 2, u'no': 'B', u'met': u'031043206'} {u'index': 3, u'no': 'C', u'met': u'0031043207'}
My current solution is to read the JSON file as a text file and remove the [
from the beginning and the ]
from the end. Thus, creating a valid JSON object on each line, rather than a nested object containing lines.
I wonder if there is a more elegant solution? I suspect something could go wrong using string manipulation on the file.
The motivation is to read json
files into RDD on Spark. See related question - Reading JSON with Apache Spark - `corrupt_record`
JSONL uses UTF-8 encoding. That is different from JSON, which allows encoding Unicode strings using ASCII escape sequences. Each line is a valid JSON value. Each line is separated with a newline, '\n', character.
Method 2: Writing JSON to a file in Python using json.dump() Another way of writing JSON to a file is by using json. dump() method The JSON package has the “dump” function which directly writes the dictionary to a file in the form of JSON, without needing to convert it into an actual JSON object.
First, to write data to a JSON file, we must create a JSON string of the data with JSON. stringify . This returns a JSON string representation of a JavaScript object, which can be written to a file.
Parse JSON - Convert from JSON to Python If you have a JSON string, you can parse it by using the json.loads() method. The result will be a Python dictionary.
Your input appears to be a sequence of Python objects; it certainly is not valid a JSON document.
If you have a list of Python dictionaries, then all you have to do is dump each entry into a file separately, followed by a newline:
import json with open('output.jsonl', 'w') as outfile: for entry in JSON_file: json.dump(entry, outfile) outfile.write('\n')
The default configuration for the json
module is to output JSON without newlines embedded.
Assuming your A
, B
and C
names are really strings, that would produce:
{"index": 1, "met": "1043205", "no": "A"} {"index": 2, "met": "000031043206", "no": "B"} {"index": 3, "met": "0031043207", "no": "C"}
If you started with a JSON document containing a list of entries, just parse that document first with json.load()
/json.loads()
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With