Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I use the 'json' module to read in one JSON object at a time?

Tags:

I have a multi-gigabyte JSON file. The file is made up of JSON objects that are no more than a few thousand characters each, but there are no line breaks between the records.

Using Python 3 and the json module, how can I read one JSON object at a time from the file into memory?

The data is in a plain text file. Here is an example of a similar record. The actual records contains many nested dictionaries and lists.

Record in readable format:

{     "results": {       "__metadata": {         "type": "DataServiceProviderDemo.Address"       },       "Street": "NE 228th",       "City": "Sammamish",       "State": "WA",       "ZipCode": "98074",       "Country": "USA"     }   } } 

Actual format. New records start one after the other without any breaks.

{"results": { "__metadata": {"type": "DataServiceProviderDemo.Address"},"Street": "NE 228th","City": "Sammamish","State": "WA","ZipCode": "98074","Country": "USA" } } }{"results": { "__metadata": {"type": "DataServiceProviderDemo.Address"},"Street": "NE 228th","City": "Sammamish","State": "WA","ZipCode": "98074","Country": "USA" } } }{"results": { "__metadata": {"type": "DataServiceProviderDemo.Address"},"Street": "NE 228th","City": "Sammamish","State": "WA","ZipCode": "98074","Country": "USA" } } } 
like image 644
Cam Avatar asked Feb 11 '14 17:02

Cam


People also ask

What is JSON parse () method?

parse() The JSON. parse() method parses a JSON string, constructing the JavaScript value or object described by the string. An optional reviver function can be provided to perform a transformation on the resulting object before it is returned.


1 Answers

Generally speaking, putting more than one JSON object into a file makes that file invalid, broken JSON. That said, you can still parse data in chunks using the JSONDecoder.raw_decode() method.

The following will yield complete objects as the parser finds them:

from json import JSONDecoder from functools import partial   def json_parse(fileobj, decoder=JSONDecoder(), buffersize=2048):     buffer = ''     for chunk in iter(partial(fileobj.read, buffersize), ''):          buffer += chunk          while buffer:              try:                  result, index = decoder.raw_decode(buffer)                  yield result                  buffer = buffer[index:].lstrip()              except ValueError:                  # Not enough data to decode, read more                  break 

This function will read chunks from the given file object in buffersize chunks, and have the decoder object parse whole JSON objects from the buffer. Each parsed object is yielded to the caller.

Use it like this:

with open('yourfilename', 'r') as infh:     for data in json_parse(infh):         # process object 

Use this only if your JSON objects are written to a file back-to-back, with no newlines in between. If you do have newlines, and each JSON object is limited to a single line, you have a JSON Lines document, in which case you can use Loading and parsing a JSON file with multiple JSON objects in Python instead.

like image 69
Martijn Pieters Avatar answered Oct 28 '22 23:10

Martijn Pieters