Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python conversion from JSON to JSONL

Tags:

python

json

I wish to manipulate a standard JSON object to an object where each line must contain a separate, self-contained valid JSON object. See JSON Lines

JSON_file =  [{u'index': 1,   u'no': 'A',   u'met': u'1043205'},  {u'index': 2,   u'no': 'B',   u'met': u'000031043206'},  {u'index': 3,   u'no': 'C',   u'met': u'0031043207'}] 

To JSONL:

{u'index': 1, u'no': 'A', u'met': u'1043205'} {u'index': 2, u'no': 'B', u'met': u'031043206'} {u'index': 3, u'no': 'C', u'met': u'0031043207'} 

My current solution is to read the JSON file as a text file and remove the [ from the beginning and the ] from the end. Thus, creating a valid JSON object on each line, rather than a nested object containing lines.

I wonder if there is a more elegant solution? I suspect something could go wrong using string manipulation on the file.

The motivation is to read json files into RDD on Spark. See related question - Reading JSON with Apache Spark - `corrupt_record`

like image 284
LearningSlowly Avatar asked Aug 12 '16 10:08

LearningSlowly


People also ask

What is difference between Jsonl and JSON?

JSONL uses UTF-8 encoding. That is different from JSON, which allows encoding Unicode strings using ASCII escape sequences. Each line is a valid JSON value. Each line is separated with a newline, '\n', character.

How do you write a Jsonl file in Python?

Method 2: Writing JSON to a file in Python using json.dump() Another way of writing JSON to a file is by using json. dump() method The JSON package has the “dump” function which directly writes the dictionary to a file in the form of JSON, without needing to convert it into an actual JSON object.

How do you write to Jsonl?

First, to write data to a JSON file, we must create a JSON string of the data with JSON. stringify . This returns a JSON string representation of a JavaScript object, which can be written to a file.

Can you convert JSON to Python?

Parse JSON - Convert from JSON to Python If you have a JSON string, you can parse it by using the json.loads() method. The result will be a Python dictionary.


1 Answers

Your input appears to be a sequence of Python objects; it certainly is not valid a JSON document.

If you have a list of Python dictionaries, then all you have to do is dump each entry into a file separately, followed by a newline:

import json  with open('output.jsonl', 'w') as outfile:     for entry in JSON_file:         json.dump(entry, outfile)         outfile.write('\n') 

The default configuration for the json module is to output JSON without newlines embedded.

Assuming your A, B and C names are really strings, that would produce:

{"index": 1, "met": "1043205", "no": "A"} {"index": 2, "met": "000031043206", "no": "B"} {"index": 3, "met": "0031043207", "no": "C"} 

If you started with a JSON document containing a list of entries, just parse that document first with json.load()/json.loads().

like image 155
Martijn Pieters Avatar answered Sep 21 '22 10:09

Martijn Pieters