I currently store about 50k hashes in my Redis table, every single one has 5 key/value pairs. Once a day I run a batch job which updates hash values, including setting some key values to the value of the other key in a hash.
Here is my python code which iterates through keys and sets old_code to new_code if new_code value exists for a give hash:
pipe = r.pipeline()
for availability in availabilities:
pipe.hget(availability["EventId"], "new_code")
for availability, old_code in zip(availabilities, pipe.execute()):
if old_code:
availability["old_code"] = old_code.decode("utf-8")
for availability in availabilities:
if "old_code" in availability:
pipe.hset(
availability["EventId"], "old_code", availability["old_code"])
pipe.hset(availability["EventId"], "new_code", availability["MsgCode"])
pipe.execute()
It's a bit weird to me that I have to iterate through keys twice to achieve the same result, is there a better way to do this?
Another thing I'm trying to figure out is how to get all hash values with the best performance. Here is how I currently do it:
d = []
pipe = r.pipeline()
keys = r.keys('*')
for key in keys:
pipe.hgetall(key)
for val, key in zip(pipe.execute(), keys):
e = {"event_id": key}
e.update(val)
if "old_key" not in e:
e["old_key"] = None
d.append(e)
So basically I do keys *
then iterate with HGETALL
across all keys to get values. This is way too slow, especially the iteration. Is there a quicker way to do it?
How about an unpside down change. Transpose the way you store the data.
Instead of having 50k hashes each with 5 values. Have 5 hashes each with 50k values.
For example your hash depends on eventid and you store new_code, old_code and other stuffs inside that hash
Now, for new_code have a hash map which will contain eventid as a member and it's value as value. So new_code alone is a hash map containing 50k member value pair.
So looping through 5 instead of 50k will be relatively quicker.
I have done a little experiment and following are the numbers
50k hashes * 5 elements
Memory : ~12.5 MB
Time to complete loop through of elements : ~1.8 seconds
5 hashes * 50k elements
Memory : ~35 MB
Time to complete loop through of elements : ~0.3 seconds.
I have tested with simple strings like KEY_i and VALUE_i (where i is the incrementer) so memory may increase in your case. And also I have just walked through the data, I haven't done any manipulations so time also will vary in your case.
As you can see this change can give you 5x performance boost up, and 2 times more memory.
Redis does compression for hashes within a range (512 - default). Since we are storing more than that range (50k) we have this spike in memory.
Basically it's a trade off and it's upto you to choose the best one that would suit for your application.
For your 1st question:
Hope this helps.
For your first problem, using a Lua script will definitely improve performance. This is untested, but something like:
update_hash = r.register_script("""
local key = KEYS[1]
local new_code = ARGS[1]
local old_code = redis.call("HGET", key, "new_code")
if old_code then
redis.call("HMSET", key, "old_code", old_code, "new_code", new_code)
else
redis.call("HSET", key, "new_code", new_code)
end
""")
# You can use transaction=False here if you don't need all the
# hashes to be updated together as one atomic unit.
pipe = r.pipeline()
for availability in availabilities:
keys = [availability["EventId"]]
args = [availability["MsgCode"]]
update_hash(keys=keys, args=args, client=pipe)
pipe.execute()
For your second problem you could again make it faster by writing a short Lua script. Instead of getting all the keys and returning them to the client, your script would get the keys and the data associated with them and return it in one call.
(Note, though, that calling keys()
is inherently slow wherever you do it. And note that in either approach you're essentially pulling your entire Redis dataset into local memory, which might or might not become a problem.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With