Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filtering Redis Hash Entries

I'm using redis to store hashes with ~100k records per hash. I want to implement filtering (faceting) the records within a given hash. Note a hash entry can belong to n filters.

After reading this and this it looks like I should:

  1. Implement a sorted SET per filter. The values within the SET correspond to the keys within a HASH.
  2. Retrieve the HASH keys from the given filter SET.
  3. Once I have the HASH keys from the SET fetch the corresponding entries from the HASH. This should give me all entries that belong to the filter.

Firstly is the above approach correct at a high level?

Assuming the approach is OK the bit I'm missing is what's the most efficient implementation to retrieve the HASH entries? Am I right in thinking once I have the HASH keys I should then use a PIPELINE to queue multiple HGETALL commands passing through each HASH key? Is there a better approach?

My concern about using a PIPELINE is that I believe it will block all other clients while servicing the command. I'll be paging the filtered results with 500 results per page. With multiple browser based clients performing filtering, not to mention the back end processes that populate the SETs and HASHes it sounds like there's potential for a lot of contention if PIPELINE does block. Could anyone provide a view on this?

If it helps I'm using 2.2.4 redis, predis for the web clients and servicestack for the back end.

Thanks, Paul

like image 628
Paul Avatar asked Apr 15 '11 10:04

Paul


People also ask

Does Redis use hash table?

It is implemented as a Hash table and handles collisions by Chaining. A Redis Hash can store up to 4 billion key value pairs. If the value is Integer, Redis hash allows you to atomically increment or decrement the value.

Is Redis hash ordered?

NO, Redis hash doesn't maintain the order. Since it's a hash, it's unordered.

What hashing algorithm does Redis use?

As it is known, Redis uses the CRC16 algorithm to map keys to hash slots.

Are Redis keys hashed?

Whereas LIST s and SET s in Redis hold sequences of items, Redis HASH es store a mapping of keys to values.


1 Answers

Redis is a lock-free non-blocking async server so there is no added contention when using pipelining. Redis hums along happily processing each operation as soon as it receives them so in practice can process multiple pipelined operations. In essence redis-server really doesn't care if the operation is pipelined or not it just processes each operation as it receives them.

The benefit of pipelining is to reduce client latency where instead of waiting for a response from redis-server for each operation before sending the next one, the client can just pump all operations at once in a single write then read back all the responses in a single read.

An example of this in action is in my Redis mini StackOverflow clone each click makes a call to ToQuestionResults() which because the operations are pipelined sends all operations on 1 Socket write call and reads the results in 1 Socket blocking read which is more efficient instead of a blocking read per call:

https://github.com/ServiceStack/ServiceStack.Examples/blob/master/src/RedisStackOverflow/RedisStackOverflow.ServiceInterface/IRepository.cs#L180

My concern about using a PIPELINE is that I believe it will block all other clients while servicing the command.

This is not a valid concern and I wouldn't over think how Redis works here, assume it's doing it the most efficiently where Pipelining doesn't block processing of other clients commands. Conceptually you can think that redis-server processes each command (pipelined or not) in FIFO order (i.e. no time is wasted in waiting/reading the entire pipeline).

You're describing something closer to MULTI/EXEC (i.e. Redis Transactions) where all operations are done at once as soon as Redis server reads EXEC (i.e. EOF Transaction). This is not a problem either and redis-server still doesn't waste any time waiting to receive your entire transaction, it just queues the partial commandset in a temporary queue until it receives the final EXEC which is then processed all at once.

This is how redis achieves atomicity by processing each command, one at a time, as soon as it receives them. Since there are no other threads, there is no thread context switching, no locks and no multi-threading issues. It basically achieves concurrency by processing each command really fast.

So in this case I would use Pipelining as it's always a win, more so the more commands you pipeline (as you reduce the blocking read count).

like image 176
mythz Avatar answered Sep 28 '22 02:09

mythz