Redis, how does SCAN cursor "state management" work?

Tags:

redis

Redis has a SCAN command that may be used to iterate keys matching a pattern etc.

Redis SCAN doc

You start by giving a cursor value of 0; each call returns a new cursor value which you pass into the next SCAN call. A value of 0 indicates iteration is finished. Supposedly no server or client state is needed (except for the cursor value)

I'm wondering how Redis implements the scanning algorithm-wise?

662

asked Jan 23 '15 02:01

seand

1 Answers

You may find answer in redis dict.c source file. Then I will quote part of it.

Iterating works the following way:

Initially you call the function using a cursor (v) value of 0. 2)
The function performs one step of the iteration, and returns the
new cursor value you must use in the next call.
When the returned cursor is 0, the iteration is complete.

The function guarantees all elements present in the dictionary get returned between the start and end of the iteration. However it is possible some elements get returned multiple times. For every element returned, the callback argument 'fn' is called with 'privdata' as first argument and the dictionary entry'de' as second argument.

How it works

The iteration algorithm was designed by Pieter Noordhuis. The main idea is to increment a cursor starting from the higher order bits. That is, instead of incrementing the cursor normally, the bits of the cursor are reversed, then the cursor is incremented, and finally the bits are reversed again.

This strategy is needed because the hash table may be resized between iteration calls. dict.c hash tables are always power of two in size, and they use chaining, so the position of an element in a given table is given by computing the bitwise AND between Hash(key) and SIZE-1 (where SIZE-1 is always the mask that is equivalent to taking the rest of the division between the Hash of the key and SIZE).

For example if the current hash table size is 16, the mask is (in binary) 1111. The position of a key in the hash table will always be the last four bits of the hash output, and so forth.

What happens if the table changes in size?

If the hash table grows, elements can go anywhere in one multiple of the old bucket: for example let's say we already iterated with a 4 bit cursor 1100 (the mask is 1111 because hash table size = 16).

If the hash table will be resized to 64 elements, then the new mask will be 111111. The new buckets you obtain by substituting in ??1100 with either 0 or 1 can be targeted only by keys we already visited when scanning the bucket 1100 in the smaller hash table.

By iterating the higher bits first, because of the inverted counter, the cursor does not need to restart if the table size gets bigger. It will continue iterating using cursors without '1100' at the end, and also without any other combination of the final 4 bits already explored.

Similarly when the table size shrinks over time, for example going from 16 to 8, if a combination of the lower three bits (the mask for size 8 is 111) were already completely explored, it would not be visited again because we are sure we tried, for example, both 0111 and 1111 (all the variations of the higher bit) so we don't need to test it again.

Wait... You have TWO tables during rehashing!

Yes, this is true, but we always iterate the smaller table first, then we test all the expansions of the current cursor into the larger table. For example if the current cursor is 101 and we also have a larger table of size 16, we also test (0)101 and (1)101 inside the larger table. This reduces the problem back to having only one table, where the larger one, if it exists, is just an expansion of the smaller one.

Limitations

This iterator is completely stateless, and this is a huge advantage, including no additional memory used. The disadvantages resulting from this design are:

It is possible we return elements more than once. However this is usually easy to deal with in the application level.
The iterator must return multiple elements per call, as it needs to always return all the keys chained in a given bucket, and all the expansions, so we are sure we don't miss keys moving during rehashing.
The reverse cursor is somewhat hard to understand at first, but this comment is supposed to help.

answered Oct 18 '22 22:10

Nick Bondarenko

Related questions
                            
                                How to imitate autocomplete search with Redis ZRANGEBYLEX?
                            
                                Using the redis-backed "kue" library in node.js -- why does my redis memory usage keep increasing?
                            
                                Express SessionID differs from SessionID in Cookie
                            
                                Redis Queue + python-rq: Right pattern to prevent high memory usage?
                            
                                Check if redis is running -> node js
                            
                                Is it possible to monitor only one database?
                            
                                How can I read from Redis inside a MULTI block in Ruby?
                            
                                Using Multiple Installations of Celery with a Redis Backend
                            
                                Redis sorted set leader board ranking on same score
                            
                                Using jedis How to cache Java object
                            
                                How to get values from Redis using keys which contains spaces?
                            
                                What is a good strategy to group similar words?
                            
                                redis-cli and value from a file
                            
                                configure redis auth on sidekiq
                            
                                redis-cli connection to Amazon ElastiCache Redis cluster hangs up
                            
                                Get the last time a given Redis key was accessed
                            
                                Connect to AWS ElastiCache with In-Transit Encryption + Auth from client other than redis-cli+stunnel
                            
                                Redis-python setting multiple key/values in one operation
                            
                                How to connect to redistogo? How to see the data?
                            
                                How to save javascript array as redis list

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With