Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Redis: the best way to get all hash values

I currently store about 50k hashes in my Redis table, every single one has 5 key/value pairs. Once a day I run a batch job which updates hash values, including setting some key values to the value of the other key in a hash.

Here is my python code which iterates through keys and sets old_code to new_code if new_code value exists for a give hash:

pipe = r.pipeline()

for availability in availabilities:
    pipe.hget(availability["EventId"], "new_code")

for availability, old_code in zip(availabilities, pipe.execute()):
    if old_code:
        availability["old_code"] = old_code.decode("utf-8")

for availability in availabilities:
    if "old_code" in availability:
        pipe.hset(
            availability["EventId"], "old_code", availability["old_code"])
    pipe.hset(availability["EventId"], "new_code", availability["MsgCode"])
pipe.execute()

It's a bit weird to me that I have to iterate through keys twice to achieve the same result, is there a better way to do this?

Another thing I'm trying to figure out is how to get all hash values with the best performance. Here is how I currently do it:

d = []
pipe = r.pipeline()
keys = r.keys('*')
for key in keys:
    pipe.hgetall(key)
for val, key in zip(pipe.execute(), keys):
    e = {"event_id": key}
    e.update(val)
    if "old_key" not in e:
        e["old_key"] = None
    d.append(e)

So basically I do keys * then iterate with HGETALL across all keys to get values. This is way too slow, especially the iteration. Is there a quicker way to do it?

like image 648
Nikolay Derkach Avatar asked Jun 30 '16 19:06

Nikolay Derkach


2 Answers

How about an unpside down change. Transpose the way you store the data.

Instead of having 50k hashes each with 5 values. Have 5 hashes each with 50k values.

For example your hash depends on eventid and you store new_code, old_code and other stuffs inside that hash

Now, for new_code have a hash map which will contain eventid as a member and it's value as value. So new_code alone is a hash map containing 50k member value pair.

So looping through 5 instead of 50k will be relatively quicker.

I have done a little experiment and following are the numbers

50k hashes * 5 elements 
Memory : ~12.5 MB
Time to complete loop through of elements : ~1.8 seconds

5 hashes * 50k elements
Memory : ~35 MB
Time to complete loop through of elements : ~0.3 seconds.

I have tested with simple strings like KEY_i and VALUE_i (where i is the incrementer) so memory may increase in your case. And also I have just walked through the data, I haven't done any manipulations so time also will vary in your case.

As you can see this change can give you 5x performance boost up, and 2 times more memory.

Redis does compression for hashes within a range (512 - default). Since we are storing more than that range (50k) we have this spike in memory.

Basically it's a trade off and it's upto you to choose the best one that would suit for your application.

For your 1st question:

  1. you are getting values of new_code in each hashes, now you have everything in a single hash -> just a single call.
  2. Then you are updating old_code and new_code one by one. Now you can do them using hmset using a single call.

Hope this helps.

like image 70
Karthikeyan Gopall Avatar answered Nov 19 '22 11:11

Karthikeyan Gopall


For your first problem, using a Lua script will definitely improve performance. This is untested, but something like:

update_hash = r.register_script("""
    local key = KEYS[1]
    local new_code = ARGS[1]

    local old_code = redis.call("HGET", key, "new_code")
    if old_code then
        redis.call("HMSET", key, "old_code", old_code, "new_code", new_code)
    else
        redis.call("HSET", key, "new_code", new_code)
    end
""")

# You can use transaction=False here if you don't need all the 
# hashes to be updated together as one atomic unit.
pipe = r.pipeline()

for availability in availabilities:
    keys = [availability["EventId"]]
    args = [availability["MsgCode"]]

    update_hash(keys=keys, args=args, client=pipe)

pipe.execute()

For your second problem you could again make it faster by writing a short Lua script. Instead of getting all the keys and returning them to the client, your script would get the keys and the data associated with them and return it in one call.

(Note, though, that calling keys() is inherently slow wherever you do it. And note that in either approach you're essentially pulling your entire Redis dataset into local memory, which might or might not become a problem.)

like image 2
Kevin Christopher Henry Avatar answered Nov 19 '22 12:11

Kevin Christopher Henry