Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Redis, how does SCAN cursor "state management" work?

Tags:

redis

Redis has a SCAN command that may be used to iterate keys matching a pattern etc.

Redis SCAN doc

You start by giving a cursor value of 0; each call returns a new cursor value which you pass into the next SCAN call. A value of 0 indicates iteration is finished. Supposedly no server or client state is needed (except for the cursor value)

I'm wondering how Redis implements the scanning algorithm-wise?

like image 662
seand Avatar asked Jan 23 '15 02:01

seand


People also ask

What is Scan Command in Redis?

As mentioned, the SCAN in Redis is a cursor-based iterator that allows you to iterate over the set of keys in a specific Redis database. The command accepts the cursor position as the argument. The server returns an update cursor every time the command is called.

Does Redis use memory optimization?

However Redis uses a memory optimization where small aggregate data types, until they reach a given amount of items or a given max size of single elements, are represented using a compact single-allocation packed encoding.

What is the difference between hscan and sscan in Redis?

SCAN iterates the set of keys in the currently selected Redis database. SSCAN iterates elements of Sets types. HSCAN iterates fields of Hash types and their associated values. ZSCAN iterates elements of Sorted Set types and their associated scores.

What is an iteration in scan?

SCAN is a cursor based iterator. This means that at every call of the command, the server returns an updated cursor that the user needs to use as the cursor argument in the next call. An iteration starts when the cursor is set to 0, and terminates when the cursor returned by the server is 0. The following is an example of SCAN iteration:


1 Answers

You may find answer in redis dict.c source file. Then I will quote part of it.

Iterating works the following way:

  1. Initially you call the function using a cursor (v) value of 0. 2)
  2. The function performs one step of the iteration, and returns the
    new cursor value you must use in the next call.
  3. When the returned cursor is 0, the iteration is complete.

The function guarantees all elements present in the dictionary get returned between the start and end of the iteration. However it is possible some elements get returned multiple times. For every element returned, the callback argument 'fn' is called with 'privdata' as first argument and the dictionary entry'de' as second argument.

How it works

The iteration algorithm was designed by Pieter Noordhuis. The main idea is to increment a cursor starting from the higher order bits. That is, instead of incrementing the cursor normally, the bits of the cursor are reversed, then the cursor is incremented, and finally the bits are reversed again.

This strategy is needed because the hash table may be resized between iteration calls. dict.c hash tables are always power of two in size, and they use chaining, so the position of an element in a given table is given by computing the bitwise AND between Hash(key) and SIZE-1 (where SIZE-1 is always the mask that is equivalent to taking the rest of the division between the Hash of the key and SIZE).

For example if the current hash table size is 16, the mask is (in binary) 1111. The position of a key in the hash table will always be the last four bits of the hash output, and so forth.

What happens if the table changes in size?

If the hash table grows, elements can go anywhere in one multiple of the old bucket: for example let's say we already iterated with a 4 bit cursor 1100 (the mask is 1111 because hash table size = 16).

If the hash table will be resized to 64 elements, then the new mask will be 111111. The new buckets you obtain by substituting in ??1100 with either 0 or 1 can be targeted only by keys we already visited when scanning the bucket 1100 in the smaller hash table.

By iterating the higher bits first, because of the inverted counter, the cursor does not need to restart if the table size gets bigger. It will continue iterating using cursors without '1100' at the end, and also without any other combination of the final 4 bits already explored.

Similarly when the table size shrinks over time, for example going from 16 to 8, if a combination of the lower three bits (the mask for size 8 is 111) were already completely explored, it would not be visited again because we are sure we tried, for example, both 0111 and 1111 (all the variations of the higher bit) so we don't need to test it again.

Wait... You have TWO tables during rehashing!

Yes, this is true, but we always iterate the smaller table first, then we test all the expansions of the current cursor into the larger table. For example if the current cursor is 101 and we also have a larger table of size 16, we also test (0)101 and (1)101 inside the larger table. This reduces the problem back to having only one table, where the larger one, if it exists, is just an expansion of the smaller one.

Limitations

This iterator is completely stateless, and this is a huge advantage, including no additional memory used. The disadvantages resulting from this design are:

  1. It is possible we return elements more than once. However this is usually easy to deal with in the application level.
  2. The iterator must return multiple elements per call, as it needs to always return all the keys chained in a given bucket, and all the expansions, so we are sure we don't miss keys moving during rehashing.
  3. The reverse cursor is somewhat hard to understand at first, but this comment is supposed to help.
like image 71
Nick Bondarenko Avatar answered Oct 18 '22 22:10

Nick Bondarenko