Requirement : Python objects with 2-3 levels of nesting containing basic datypes like integers,strings, lists, and dicts. ( no dates etc), needs to be stored as json in redis against a key. What are the best methods available for compressing json as a string for low memory footprint. The target objects are not very large, having 1000 small elements on average, or about 15000 characters when converted to JSON.
eg.
>>> my_dict
{'details': {'1': {'age': 13, 'name': 'dhruv'}, '2': {'age': 15, 'name': 'Matt'}}, 'members': ['1', '2']}
>>> json.dumps(my_dict)
'{"details": {"1": {"age": 13, "name": "dhruv"}, "2": {"age": 15, "name": "Matt"}}, "members": ["1", "2"]}'
### SOME BASIC COMPACTION ###
>>> json.dumps(my_dict, separators=(',',':'))
'{"details":{"1":{"age":13,"name":"dhruv"},"2":{"age":15,"name":"Matt"}},"members":["1","2"]}'
1/ Are there any other better ways to compress json to save memory in redis ( also ensuring light weight decoding afterwards ).
2/ How good a candidate would be msgpack [http://msgpack.org/] ?
3/ Shall I consider options like pickle as well ?
As text data, JSON data compresses nicely. That's why gzip is our first option to reduce the JSON data size. Moreover, it can be automatically applied in HTTP, the common protocol for sending and receiving JSON. Let's take the JSON produced with the default Jackson options and compress it with gzip.
Though JSON is a concise format, it is also better used over a slow network in compressed mode.
There is a fixed number (1GB), but the ideal size is going to be less than this value. If you need a number, my recommendation is usually to try to keep JSON documents to a few MB in size. This makes the most sense.
JSON newline delimited files A huge advantage is that you can iterate over each data point and do not need to load the whole object (list) in memory. There is a library in Python jsonlines to write these files.
We just use gzip
as a compressor.
import gzip
import cStringIO
def decompressStringToFile(value, outputFile):
"""
decompress the given string value (which must be valid compressed gzip
data) and write the result in the given open file.
"""
stream = cStringIO.StringIO(value)
decompressor = gzip.GzipFile(fileobj=stream, mode='r')
while True: # until EOF
chunk = decompressor.read(8192)
if not chunk:
decompressor.close()
outputFile.close()
return
outputFile.write(chunk)
def compressFileToString(inputFile):
"""
read the given open file, compress the data and return it as string.
"""
stream = cStringIO.StringIO()
compressor = gzip.GzipFile(fileobj=stream, mode='w')
while True: # until EOF
chunk = inputFile.read(8192)
if not chunk: # EOF?
compressor.close()
return stream.getvalue()
compressor.write(chunk)
In our usecase we store the result as files, as you can imagine. To use just in-memory strings, you can use a cStringIO.StringIO()
object as a replacement for the file as well.
Based on @Alfe's answer above here is a version that keeps the contents in memory (for network I/O tasks). I also made a few changes to support Python 3.
import gzip
from io import StringIO, BytesIO
def decompressBytesToString(inputBytes):
"""
decompress the given byte array (which must be valid
compressed gzip data) and return the decoded text (utf-8).
"""
bio = BytesIO()
stream = BytesIO(inputBytes)
decompressor = gzip.GzipFile(fileobj=stream, mode='r')
while True: # until EOF
chunk = decompressor.read(8192)
if not chunk:
decompressor.close()
bio.seek(0)
return bio.read().decode("utf-8")
bio.write(chunk)
return None
def compressStringToBytes(inputString):
"""
read the given string, encode it in utf-8,
compress the data and return it as a byte array.
"""
bio = BytesIO()
bio.write(inputString.encode("utf-8"))
bio.seek(0)
stream = BytesIO()
compressor = gzip.GzipFile(fileobj=stream, mode='w')
while True: # until EOF
chunk = bio.read(8192)
if not chunk: # EOF?
compressor.close()
return stream.getvalue()
compressor.write(chunk)
To test the compression try:
inputString="asdf" * 1000
len(inputString)
len(compressStringToBytes(inputString))
decompressBytesToString(compressStringToBytes(inputString))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With