Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: make a list generator JSON serializable

Tags:

How can I concat a list of JSON files into a huge JSON array? I've 5000 files and 550 000 list items.

My fist try was to use jq, but it looks like jq -s is not optimized for a large input.

jq -s -r '[.[][]]' *.js  

This command works, but takes way too long to complete and I really would like to solve this with Python.

Here is my current code:

def concatFiles(outName, inFileNames):     def listGenerator():         for inName in inFileNames:             with open(inName, 'r') as f:                 for item in json.load(f):                     yield item      with open(outName, 'w') as f:         json.dump(listGenerator(), f) 

I'm getting:

TypeError: <generator object listGenerator at 0x7f94dc2eb3c0> is not JSON serializable 

Any attempt load all files into ram will trigger the OOM-killer of Linux. Do you have any ideas?

like image 513
Sebastian Wagner Avatar asked Feb 09 '14 19:02

Sebastian Wagner


1 Answers

As of simplejson 3.8.0, you can use the iterable_as_array option to make any iterable serializable into an array

# Since simplejson is backwards compatible, you should feel free to import # it as `json` import simplejson as json json.dumps((i*i for i in range(10)), iterable_as_array=True) 

result is [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

like image 125
Nick Babcock Avatar answered Oct 29 '22 19:10

Nick Babcock