Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Storing integers in a redis ordered set?

Tags:

redis

I have a system which deals with keys that have been turned into unsigned long integers (by packing short sequences into byte strings). I want to try storing these in Redis, and I want to do it in the best way possible. My concern is mainly memory efficiency.

From playing with the online REPL I notice that the two following are identical

zadd myset 1.0 "123"

zadd myset 1.0 123

This means that even if I know I want to store an integer, it has to be set as a string. I notice from the documentation that keys are just stored as char*s and that commands like SETBIT indicate that Redis is not averse to treating strings as bytestrings in the client. This hints at a slightly more efficient way of storing unsigned longs than as their string representation.

What is the best way to store unsigned longs in sorted sets?

like image 406
Joe Avatar asked Nov 01 '11 21:11

Joe


People also ask

Can Redis store integers?

Redis stores integers in their integer representation, so for string values that actually hold an integer, there is no overhead for storing the string representation of the integer.

Are Redis sets ordered?

A Redis sorted set is a collection of unique strings (members) ordered by an associated score. When more than one string has the same score, the strings are ordered lexicographically. Some use cases for sorted sets include: Leaderboards.

Are Redis keys always strings?

Keys in Redis are all strings, so it doesn't really matter what kind of value you pass into a client.

Is Redis key value store?

redis is an in-memory, key/value store. Think of it as a dictionary with any number of keys, each of which has a value that can be set or retrieved. However, Redis goes beyond a simple key/value store as it is actually a data structures server, supporting different kinds of values.


1 Answers

Thanks to Andre for his answer. Here are my findings.

Storing ints directly

Redis keys must be strings. If you want to pass an integer, it has to be some kind of string. For small, well-defined sets of values, Redis will parse the string into an integer, if it is one. My guess is that it will use this int to tailor its hash function (or even statically dimension a hash table based on the value). This works for small values (examples being the default values of 64 entries of a value of up to 512). I will test for larger values during my investigation.

http://redis.io/topics/memory-optimization

Storing as strings

The alternative is squashing the integer so it looks like a string.

It looks like it is possible to use any byte string as a key.

For my application's case it actually didn't make that much difference storing the strings or the integers. I imagine that the structure in Redis undergoes some kind of alignment anyway, so there may be some pre-wasted bytes anyway. The value is hashed in any case.

Using Python for my testing, so I was able to create the values using the struct.pack. long longs weigh in at 8 bytes, which is quite large. Given the distribution of integer values, I discovered that it could actually be advantageous to store the strings, especially when coded in hex.

As redis strings are "Pascal-style":

struct sdshdr {
    long len;
    long free;
    char buf[];
};

and given that we can store anything in there, I did a bit of extra Python to code the type into the shortest possible type:

def do_pack(prefix, number):
    """
    Pack the number into the best possible string. With a prefix char.
    """ 

    # char
    if number < (1 << 8*1):
        return pack("!cB", prefix, number)

    # ushort
    elif number < (1 << 8*2):
        return pack("!cH", prefix, number)

    # uint
    elif number < (1 << 8*4):
        return pack("!cI", prefix, number)

    # ulonglong
    elif number < (1 << 8*8):
        return pack("!cQ", prefix, number)

This appears to make an insignificant saving (or none at all). Probably due to struct padding in Redis. This also drives Python CPU through the roof, making it somewhat unattractive.

The data I was working with was 200000 zsets of consecutive integer => (weight, random integer) × 100, plus some inverted index (based on random data). dbsize yields 1,200,001 keys.

Final memory use of server: 1.28 GB RAM, 1.32 Virtual. Various tweaks made a difference of no more than 10 megabytes either way.

So my conclusion:

Don't bother encoding into fixed-size data types. Just store the integer as a string, in hex if you want. It won't make all that much difference.

References:

http://docs.python.org/library/struct.html

http://redis.io/topics/internals-sds

like image 143
Joe Avatar answered Nov 04 '22 03:11

Joe