Which is the best way to compress json to store in a memory based store like redis or memcache?

Tags:

Requirement : Python objects with 2-3 levels of nesting containing basic datypes like integers,strings, lists, and dicts. ( no dates etc), needs to be stored as json in redis against a key. What are the best methods available for compressing json as a string for low memory footprint. The target objects are not very large, having 1000 small elements on average, or about 15000 characters when converted to JSON.

eg.

>>> my_dict
{'details': {'1': {'age': 13, 'name': 'dhruv'}, '2': {'age': 15, 'name': 'Matt'}}, 'members': ['1', '2']}
>>> json.dumps(my_dict)
'{"details": {"1": {"age": 13, "name": "dhruv"}, "2": {"age": 15, "name": "Matt"}}, "members": ["1", "2"]}'
### SOME BASIC COMPACTION ###
>>> json.dumps(my_dict, separators=(',',':'))
'{"details":{"1":{"age":13,"name":"dhruv"},"2":{"age":15,"name":"Matt"}},"members":["1","2"]}'

1/ Are there any other better ways to compress json to save memory in redis ( also ensuring light weight decoding afterwards ).

2/ How good a candidate would be msgpack [http://msgpack.org/] ?

3/ Shall I consider options like pickle as well ?

477

asked Mar 20 '13 14:03

DhruvPathak

2 Answers

We just use gzip as a compressor.

import gzip
import cStringIO

def decompressStringToFile(value, outputFile):
  """
  decompress the given string value (which must be valid compressed gzip
  data) and write the result in the given open file.
  """
  stream = cStringIO.StringIO(value)
  decompressor = gzip.GzipFile(fileobj=stream, mode='r')
  while True:  # until EOF
    chunk = decompressor.read(8192)
    if not chunk:
      decompressor.close()
      outputFile.close()
      return 
    outputFile.write(chunk)

def compressFileToString(inputFile):
  """
  read the given open file, compress the data and return it as string.
  """
  stream = cStringIO.StringIO()
  compressor = gzip.GzipFile(fileobj=stream, mode='w')
  while True:  # until EOF
    chunk = inputFile.read(8192)
    if not chunk:  # EOF?
      compressor.close()
      return stream.getvalue()
    compressor.write(chunk)

In our usecase we store the result as files, as you can imagine. To use just in-memory strings, you can use a cStringIO.StringIO() object as a replacement for the file as well.

151

answered Oct 17 '22 18:10

Alfe

Based on @Alfe's answer above here is a version that keeps the contents in memory (for network I/O tasks). I also made a few changes to support Python 3.

import gzip
from io import StringIO, BytesIO

def decompressBytesToString(inputBytes):
  """
  decompress the given byte array (which must be valid 
  compressed gzip data) and return the decoded text (utf-8).
  """
  bio = BytesIO()
  stream = BytesIO(inputBytes)
  decompressor = gzip.GzipFile(fileobj=stream, mode='r')
  while True:  # until EOF
    chunk = decompressor.read(8192)
    if not chunk:
      decompressor.close()
      bio.seek(0)
      return bio.read().decode("utf-8")
    bio.write(chunk)
  return None

def compressStringToBytes(inputString):
  """
  read the given string, encode it in utf-8,
  compress the data and return it as a byte array.
  """
  bio = BytesIO()
  bio.write(inputString.encode("utf-8"))
  bio.seek(0)
  stream = BytesIO()
  compressor = gzip.GzipFile(fileobj=stream, mode='w')
  while True:  # until EOF
    chunk = bio.read(8192)
    if not chunk:  # EOF?
      compressor.close()
      return stream.getvalue()
    compressor.write(chunk)

To test the compression try:

inputString="asdf" * 1000
len(inputString)
len(compressStringToBytes(inputString))
decompressBytesToString(compressStringToBytes(inputString))

answered Oct 17 '22 16:10

AlaskaJoslin

Related questions
                            
                                How to cleanly keep below 80-char width with long strings?
                            
                                ggplot2: geom_text() with facet_grid()?
                            
                                Installing bsddb package - python
                            
                                How to pass dynamic value in @Url.Action?
                            
                                Raspberry Pi- GPIO Events in Python
                            
                                WCF Error - The maximum message size quota for incoming messages (65536) has been exceeded [duplicate]
                            
                                Does Amazon S3 have a connection pool?
                            
                                Plugin 'FEDERATED' is disabled
                            
                                Java variable across multiple include blocks - variable cannot be resolved
                            
                                Additional text encountered after finished reading JSON content:
                            
                                What is faster than std::pow?
                            
                                Load knockout template from external file without complex engine?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Which is the best way to compress json to store in a memory based store like redis or memcache?

Tags:

python

json

redis

msgpack

DhruvPathak

People also ask

2 Answers

Alfe

AlaskaJoslin

Recent Activity

Donate For Us