I have a bunch of json objects that I need to compress as it's eating too much disk space, approximately 20 gigs
worth for a few million of them.
Ideally what I'd like to do is compress each individually and then when I need to read them, just iteratively load and decompress each one. I tried doing this by creating a text file with each line being a compressed json object via zlib, but this is failing with a
decompress error due to a truncated stream
,
which I believe is due to the compressed strings containing new lines.
Anyone know of a good method to do this?
As text data, JSON data compresses nicely. That's why gzip is our first option to reduce the JSON data size. Moreover, it can be automatically applied in HTTP, the common protocol for sending and receiving JSON. Let's take the JSON produced with the default Jackson options and compress it with gzip.
However, you must configure your API to enable compression of the method response payload. To enable compression on an API , set the minimumCompressionsSize property to a non-negative integer between 0 and 10485760 (10M bytes) when you create the API or after you've created the API.
An image is of the type "binary" which is none of those. So you can't directly insert an image into JSON. What you can do is convert the image to a textual representation which can then be used as a normal string. The most common way to achieve that is with what's called base64.
Just use a gzip.GzipFile()
object and treat it like a regular file; write JSON objects line by line, and read them line by line.
The object takes care of compression transparently, and will buffer reads, decompressing chucks as needed.
import gzip
import json
# writing
with gzip.GzipFile(jsonfilename, 'w') as outfile:
for obj in objects:
outfile.write(json.dumps(obj) + '\n')
# reading
with gzip.GzipFile(jsonfilename, 'r') as infile:
for line in infile:
obj = json.loads(line)
# process obj
This has the added advantage that the compression algorithm can make use of repetition across objects for compression ratios.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With