Redis memory optimization

I am trying to encode some data ( a very big string actually ) in a very memory efficient way on the Redis side. According to the Redis docs, it is claimed that "use hashes when possible", and it declares two configuration parameters:

  • The "hash-max-zipmap-entries", which if I understood well it denotes how many keys at most every hash key must have ( am I right?).

  • The "hash-max-zipmap-value", which denotes the maximum length for the value. Does it refer to the field or to the value, actually? And the length is in bytes, characters, or what?

My thought is to split the string ( which somehow has fixed length) in such quantities that will play well with the above parameters, and store them as values. The fields should be just sequence numbers, to ensure a consistent decoding..

EDIT: I have benchmarked extensively and it seems that encoding the string in a hash yields a ~50% better memory consumption.

Here is my benchmarking script:

import redis, random, sys

def new_db():
    db = redis.Redis(host='localhost', port=6666, db=0)
    return db

def db_info(db):
    return " used memory %s " % db.info()["used_memory_human"]

def random_string(_len):
    letters = "ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890"
    return "".join([letters[random.randint(0,len(letters)-1)] for i in range(_len) ]) 

def chunk(astr, size):
    while len(astr) > size:
        yield astr[:size]
        astr = astr[size:]
    if len(astr):  
        yield astr 

def encode_as_dict(astr, size): 
    cnt = 0
    for i in chunk(astr,size):
        dod[cnt] = i
    return dod

r = random_string(1000000)
print "size of string in bytes ", sys.getsizeof(r)
print "default Redis memory consumption", db_info(db)
dict_chunk = 10000

print "*"*100

db.set("akey", r)
print "as string " , db_info(db)
print "*"*100

db.hmset("akey", encode_as_dict(r,dict_chunk))
print "as dict and stored at value" , db_info(db)
print "*"*100

and the results on my machine (32bit Redis instance):

size of string in bytes  1000024
default Redis memory consumption  used memory 534.52K 

as string   used memory 2.98M 
as dict and stored at value  used memory 1.49M 

I am asking if there is a more efficient way to store the string as a hash, by playing with the parameters I mentioned. So firstly, I must be aware of what they mean.. Then I'll benchmark again and see if there is more gain..

EDIT2: Am I an idiot? The benchmarking is correct, but it's confirmed for one big string. If I repeat for many big strings, storing them as big strings is the definite winner.. I think that the reason why I got those results for one string lies in the Redis internals..

2 Answers

Actually, the most efficient way to store a large string is as a large string - anything else adds overhead. The optimizations you mention are for dealing with lots of short strings, where empty space between the strings can become an issue.

Performance on storing a large string may not be as good as for small strings due to the need to find more contiguous blocks to store it, but that is unlikely to actually affect anything.

I've tried reading the Redis docs about the settings you mention, and it isn't easy. But it doesn't sound to me like your plan is a good idea. The hashing they describe is designed to save memory for small values. The values are still stored completely in memory. It sounds to me like they are reducing the overhead when they appear many times, for example, when a string is added to many sets. Your string doesn't meet these criteria. I strongly doubt you will save memory using your scheme.

You can of course benchmark it to see.

