How to incrementally write into a json file

Tags:

I am writing a program, which requires me to generate a very large json file. I know the traditional way is to dump a dictionary list using json.dump(), but the list just got too big that even the total memory + swap space cannot hold it before it is dumped. Is there anyway to stream it into a json file, i.e., write the data into the json file incrementally?

583

asked Mar 22 '16 14:03

fanchyna

2 Answers

I know this is a year late, but the issue is still open and I'm surprised the json.iterencode() was not mentioned.

The potential problem with iterencode in this example, is that you would want to have an iterative handle on the large data set by using a generator, and json encode does not serialize generators.

The way around this is to the subclass list type and override the __iter__ magic method so that you could yield the output of your generator.

Here is an example of this list sub class.

class StreamArray(list):
    """
    Converts a generator into a list object that can be json serialisable
    while still retaining the iterative nature of a generator.

    IE. It converts it to a list without having to exhaust the generator
    and keep it's contents in memory.
    """
    def __init__(self, generator):
        self.generator = generator
        self._len = 1

    def __iter__(self):
        self._len = 0
        for item in self.generator:
            yield item
            self._len += 1

    def __len__(self):
        """
        Json parser looks for a this method to confirm whether or not it can
        be parsed
        """
        return self._len

The usage from here on is quite simple. Get the generator handle, pass it into the StreamArray class, pass the stream array object into iterencode() and iterate over the chunks. The chunks will be json formated output which can be directly written to file.

Example usage:

#Function that will iteratively generate a large set of data.
def large_list_generator_func():
    for i in xrange(5):
        chunk = {'hello_world': i}
        print 'Yielding chunk: ', chunk
        yield chunk

#Write the contents to file:
with open('/tmp/streamed_write.json', 'w') as outfile:
    large_generator_handle = large_list_generator_func()
    stream_array = StreamArray(large_generator_handle)
    for chunk in json.JSONEncoder().iterencode(stream_array):
        print 'Writing chunk: ', chunk
        outfile.write(chunk)

The output that shows yield and writes happen consecutively.

Yielding chunk:  {'hello_world': 0}
Writing chunk:  [
Writing chunk:  {
Writing chunk:  "hello_world"
Writing chunk:  : 
Writing chunk:  0
Writing chunk:  }
Yielding chunk:  {'hello_world': 1}
Writing chunk:  , 
Writing chunk:  {
Writing chunk:  "hello_world"
Writing chunk:  : 
Writing chunk:  1
Writing chunk:  }

answered Sep 20 '22 15:09

Werner Smit

Sadly the json library does not have any incremental writing facilities, and therefore cannot do what you want.

That's clearly going to be a very large file - would some other representation be more appropriate?

Otherwise the best suggestion I can make is to dump each list entry to an in-memory structure and write them out with the necessary delimiters ([ at the beginning, ],[ between entries and ] at the end) to try and construct the JSON that you need.

If formatting is important you should know that the wrapper test your program writes will destroy correct indentation, but indentation is only for humans so it shouldn't make any difference to the semantics of the JSON structure.

answered Sep 22 '22 15:09

holdenweb

Related questions
                            
                                Matplot: How to plot true/false or active/deactive data?
                            
                                Matplotlib : What is the function of cmap in imshow?
                            
                                opencv rectangle with dotted or dashed lines
                            
                                Convert an image to 2D array in python
                            
                                How to use select_related with GenericForeignKey in django?
                            
                                How does conda work internally?
                            
                                Print 10 most frequently occurring words of a text that including and excluding stopwords
                            
                                Pygame font error
                            
                                PyCharm unresolved reference when importing class from other file
                            
                                Looking for idiomatic way to evaluate to False if argument is False in Python 3
                            
                                Get max value index for a list of dicts
                            
                                Find end nodes (leaf nodes) in radial (tree) networkx graph
                            
                                Passing Pk or Slug to Generic DetailView in Django?
                            
                                Python Decorator for printing every line executed by a function
                            
                                Google Drive API Client (Python): Insufficient Permission for files().insert()
                            
                                (python) plot 3d surface with colormap as 4th dimension, function of x,y,z
                            
                                How to Change image captured date in python?
                            
                                Error in function to return 3 largest values from a list of numbers
                            
                                render html strings in flask templates
                            
                                Duplicate column name

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to incrementally write into a json file

Tags:

python

json

dictionary

fanchyna

People also ask

2 Answers

Werner Smit

holdenweb

Recent Activity

Donate For Us