Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Understanding hash Implementation and it's memory in Redis

From documentation we know redis does compression for data within a range (512 by default). If the hash ranges over 512 then the memory difference would be 10 times.

I did a small experiment for hashes ranging from 1 to 512 and found some interesting pattern.

This Graph represents Memory taken (in KB) for 1000 hashes each containing entries ranging from 1 to 512.

enter image description here

As you can see in this graph. There are steep in memory at certain intervals. I understand the hash implementation in redis also follows some logic for extending the size when it reaches a certain range and not increasing it for every new entry. From the numbers, it doesn't follow the doubling pattern throughout, but from 215 to 216 it does exactly double, 4 MB to 8 MB. From 420 to 421 it increases almost half 8 MB to 12 MB. In steeps within 215 I couldn't see any pattern it varies between 1/4th,1/5th and 1/6th.

With my observation following are my questions:

  1. Can someone explain me about the internal happenings of hashmap in terms of memory and resizing? What is the logic followed during resizing?
  2. If I have to loose double the memory just for storing one more entry that is 215 to 216, why can't I restrict my application to have a hashes less than 215 always, unless and until the system needs it at the most.
  3. Suppose if I want to store 1 million hashes each consisting of 250 values I need 800MB. If I split them into 2 hashes of 125 values ie, 2 million hashes of 125 values I need 500MB. In this way I am saving 300 MB which is huge!!. Is this calcuation right? Am I missing something in this case?

Thanks in advance

like image 598
Karthikeyan Gopall Avatar asked Jun 03 '16 10:06

Karthikeyan Gopall


2 Answers

As you have constantly 1000 keys in redis, only field number changes in each hash key, and you field number is lower than 512, so this phenomenon is only caused by jemalloc.

Here is the behavior as I use libc as my mem_allocator:

You can remake your redis by :

make MALLOC=libc

Run your test again, and see what you will get.

To answer your questions:

  1. Can someone explain me about the internal happenings of hashmap in terms of memory and resizing? What is the logic followed during resizing?

    As mentioned, what you encountered has nothing to do with redis itself. Jemalloc do it this way to improve its efficiency

  2. If I have to loose double the memory just for storing one more entry that is 215 to 216, why can't I restrict my application to have a hashes less than 215 always, unless and until the system needs it at the most.

    Of course, you can do it as long as you can restrict the field numbers

  3. Suppose if I want to store 1 million hashes each consisting of 250 values I need 800MB. If I split them into 2 hashes of 125 values ie, 2 million hashes of 125 values I need 500MB. In this way I am saving 300 MB which is huge!!. Is this calcuation right? Am I missing something in this case?

    I don't think that's a proper way to do that. Maybe you can save some memory by doing this. However, the disadvantages are: as you split 1 million hashes to 2 million, redis will do rehashing(which will take you some space) and it will take you more time to find one key, because it will leads to more chance of hash confliction.

like image 163
sel-fish Avatar answered Oct 07 '22 07:10

sel-fish


@sel-fish Got it right. It's all about the memory allocator. I would like to add few more information to that for the benefit of others.

I have done one more experiment comparing the time taken for doing the same operation in jemalloc and libc. I have done this two experiments in exactly under the same circumstance for better clarity. I couldn't find any major differences in the performance, libc wins most of the times.

I have attached the screenshots.

enter image description here

enter image description here

So as you can see in the graphs, jemalloc resizing is kinda double and libc is serially increasing.

And there is no major performance drop (time taken) by using libc. Actually libc takes less time in comparison in most of the areas.

I have also studied few articles about libc and jemalloc, In my point of view, For this scenario libc wins.

I would also like to hear from others on their point of view on the same.

like image 22
Karthikeyan Gopall Avatar answered Oct 07 '22 06:10

Karthikeyan Gopall