I am trying to encode some data ( a very big string actually ) in a very memory efficient way on the Redis side. According to the Redis docs, it is claimed that "use hashes when possible", and it declares two configuration parameters:
The "hash-max-zipmap-entries", which if I understood well it denotes how many keys at most every hash key must have ( am I right?).
The "hash-max-zipmap-value", which denotes the maximum length for the value. Does it refer to the field or to the value, actually? And the length is in bytes, characters, or what?
My thought is to split the string ( which somehow has fixed length) in such quantities that will play well with the above parameters, and store them as values. The fields should be just sequence numbers, to ensure a consistent decoding..
EDIT: I have benchmarked extensively and it seems that encoding the string in a hash yields a ~50% better memory consumption.
Here is my benchmarking script:
import redis, random, sys
def new_db():
db = redis.Redis(host='localhost', port=6666, db=0)
db.flushall()
return db
def db_info(db):
return " used memory %s " % db.info()["used_memory_human"]
def random_string(_len):
letters = "ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890"
return "".join([letters[random.randint(0,len(letters)-1)] for i in range(_len) ])
def chunk(astr, size):
while len(astr) > size:
yield astr[:size]
astr = astr[size:]
if len(astr):
yield astr
def encode_as_dict(astr, size):
dod={}
cnt = 0
for i in chunk(astr,size):
dod[cnt] = i
cnt+=1
return dod
db=new_db()
r = random_string(1000000)
print "size of string in bytes ", sys.getsizeof(r)
print "default Redis memory consumption", db_info(db)
dict_chunk = 10000
print "*"*100
print "BENCHMARKING \n"
db=new_db()
db.set("akey", r)
print "as string " , db_info(db)
print "*"*100
db=new_db()
db.hmset("akey", encode_as_dict(r,dict_chunk))
print "as dict and stored at value" , db_info(db)
print "*"*100
and the results on my machine (32bit Redis instance):
size of string in bytes 1000024
default Redis memory consumption used memory 534.52K
******************************************************************************************
BENCHMARKING
as string used memory 2.98M
******************************************************************************************
as dict and stored at value used memory 1.49M
I am asking if there is a more efficient way to store the string as a hash, by playing with the parameters I mentioned. So firstly, I must be aware of what they mean.. Then I'll benchmark again and see if there is more gain..
EDIT2: Am I an idiot? The benchmarking is correct, but it's confirmed for one big string. If I repeat for many big strings, storing them as big strings is the definite winner.. I think that the reason why I got those results for one string lies in the Redis internals..
Actually, the most efficient way to store a large string is as a large string - anything else adds overhead. The optimizations you mention are for dealing with lots of short strings, where empty space between the strings can become an issue.
Performance on storing a large string may not be as good as for small strings due to the need to find more contiguous blocks to store it, but that is unlikely to actually affect anything.
I've tried reading the Redis docs about the settings you mention, and it isn't easy. But it doesn't sound to me like your plan is a good idea. The hashing they describe is designed to save memory for small values. The values are still stored completely in memory. It sounds to me like they are reducing the overhead when they appear many times, for example, when a string is added to many sets. Your string doesn't meet these criteria. I strongly doubt you will save memory using your scheme.
You can of course benchmark it to see.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With