Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compressing strings before putting them in redis - does it make sense?

A bit more detail: we're already trying to take the most advantage of zipmaps, ziplists, etc, and I'm wondering whether these representations are already compressed, or are just serialized hashes and lists; does compression significantly reduce memory usage?

Also, does compression overhead at the app server layer get offset by lower network usage? StackOverflow's experience suggests it does, any other opinions?

In brief, does it make sense - for both short and longer strings?

like image 691
Hristo Avatar asked Jul 02 '11 11:07

Hristo


3 Answers

Redis does not compress your values, and if you should compress them yourself depends a lot on the size of the strings you are going to store. For big strings, hundreds of K's and more it's probably worth the extra CPU cycles on the client side, just like it is when you serve web pages, but for shorter strings it's likely a waste of time. Short strings generally don't compress much, so the gain would be too small.

like image 153
Theo Avatar answered Oct 16 '22 16:10

Theo


There's a practical way to get good compression, even for very small strings (50 bytes!) -

If your values are somewhat similar to each other - for example, they're JSON representations of a few related classes of objects - you can precompute a compressor/decompressor dictionary based on some example text.

It sounds complicated, but it's simple in practice - and simpler still with the right wrapper code to handle it.

Here's a Python implementation:

https://github.com/internetarchive/openlibrary/blob/master/openlibrary/utils/compress.py

and here's a wrapper for compressing a specific class of strings: (short JSON records)

https://github.com/internetarchive/openlibrary/blob/master/openlibrary/utils/olcompress.py

One catch: to do this efficiently, your compression library must support 'cloning' the internal state. (The Python library does) You can implement something similar by prepending the example text when compressing, but this means paying an extra computation cost.

Thanks to solrize for this awesome trick.

like image 43
Mike McCabe Avatar answered Oct 16 '22 16:10

Mike McCabe


Redis and clients are typically IO bound and the IO costs are typically at least 2 orders of magnitude in respect to the rest of the request/reply sequence. Smaller payloads will give you higher throughput and lower latencies.

I do not believe there are any hard and fast rules beyond: cost of compression << IO gains. You should bench it and find the sweat spot in setting the lower bound, but the MTU of your network is not a bad starting point for the lower bound.

like image 45
alphazero Avatar answered Oct 16 '22 16:10

alphazero