Redis cluster live reshard failure

Tags:

We're using redis-cluster extensively in our production env. We currently have a 30 node cluster (15 masters, 15 slaves) We're trying to increase the cluster, for that we've created new servers & joined them to the cluster. so far all is well.

Next - we're trying to reshard the slots to the new masters. we wrote a script that does this, using the redis-trib reshard command.

However - the migration fails midway (but not very far from the start) with this error: [ERR] Calling MIGRATE: ERR Target instance replied with error: BUSYKEY Target key name already exists.

This happens sporadically, at times it manages to move some slots before failing, at times it fails on the first slot. Each such failure requires a manual fixing operation which makes the reshard operation very hard to manage.

We have not found any concrete example of this, nor any idea on how to prevent this other than a downtime migration. which we are trying to avoid.

Versions:
redis server 4.0.2
redis trib 3.3.3 (downgraded from 4.0.2 following this issue : redis cluster reshard [ERR] Calling MIGRATE: ERR Syntax error)

Our next step is to upgrade to latest redis (4.0.11), even though we didn't find any indication in the release notes of this issue.

Hoping to hear we're doing something wrong and how to fix it, or is redis-cluster not built for live resharding ?

Thanks

702

asked Oct 31 '18 09:10

Shlomi Sutton

1 Answers

I have faced like this problem while working with redis-clustering support for our own project. I found a problem with the redis-trib reshard command. It works fine if no key is stored in slots those are migrating from one master to another.

But redis-5 (still developing, not stable yet) has it's own `redis-cli' that has no problem with resharding command I think. Only for lower versions of 5 it happens.

If you look at the official docs for redis say redis reconfiguration and redis cluster resharding, you'll find what they do internally to reshard.

So I solved the problem by doing those tasks by running a bash script instead of running redis-trib reshard command.

Suppose you want to reshard some slots from a master node to other master node. We'll call the node that has the current ownership of the hash slot the source node, and the node where we want to migrate the destination node.

For each slot do the following steps:

Remember that the order of these steps is important here according to redis official docs.

Send CLUSTER SETSLOT <slot> IMPORTING <source-node-id> to destination node to set the slot to importing state.
Send CLUSTER SETSLOT <slot> MIGRATING <destination-node-id> to source node to set the slot to migrating state.
Get keys from the source node with CLUSTER GETKEYSINSLOT command and move them into the destination node using the following MIGRATE command.

MIGRATE target_host target_port key target_database_id timeout

In Redis Cluster there is no need to specify a database other than 0, but MIGRATE is a general command that can be used for other tasks not involving Redis Cluster.
When the migration process is finally finished, use CLUSTER SETSLOT <slot> NODE <destination-node-id> in both source node and destination node in order to set the slot to their normal state again. The same command is usually sent to all other nodes to avoid waiting for the natural propagation of the new configuration across the cluster.

A simple example bash script to do this is also given here:

source-ip: 172.17.0.5. source-id: 1f70a5107e0042a7d33a9efaf88dbdfecd78076a

destination-ip: 172.17.0.4. destination-id: 7e428bae84697a3882ecad19bd0d13ac7ee97d02

another master ip: 172.17.0.7

for i in `seq 0 5460`; do
    redis-cli -c -h 172.17.0.4 cluster setslot ${i} importing 1f70a5107e0042a7d33a9efaf88dbdfecd78076a
    redis-cli -c -h 172.17.0.5 cluster setslot ${i} migrating 7e428bae84697a3882ecad19bd0d13ac7ee97d02
    while true; do
        key=`redis-cli -c -h 172.17.0.5 cluster getkeysinslot ${i} 1`
        if [ "" = "$key" ]; then
            echo "there are no key in this slot ${i}"
            break
        fi
        redis-cli -h 172.17.0.5 migrate 172.17.0.4 6379 ${key} 0 5000
    done
    redis-cli -c -h 172.17.0.5 cluster setslot ${i} node 7e428bae84697a3882ecad19bd0d13ac7ee97d02
    redis-cli -c -h 172.17.0.4 cluster setslot ${i} node 7e428bae84697a3882ecad19bd0d13ac7ee97d02
    redis-cli -c -h 172.17.0.7 cluster setslot ${i} node 7e428bae84697a3882ecad19bd0d13ac7ee97d02
done

130

answered Sep 24 '22 16:09

Shudipta Sharma

Related questions
                            
                                Ubuntu 14.04 nc 100% CPU usage
                            
                                Why A single Jedis instance is not threadsafe?
                            
                                rails hiredis undefined symbol
                            
                                Low level caching for collection
                            
                                Getting a lost Sentinel error message for Redis
                            
                                Does Redis RDB run bgsave or save?
                            
                                Redis Pub/Sub with Spring Data Redis: Messages arrive in wrong order
                            
                                Heroku RedisCloud Redis::CannotConnectError on localhost instead of REDISCLOUD_URL
                            
                                Is it possible to use hget in ioredis?
                            
                                Why pub sub in redis cannot be used together with other commands?
                            
                                Spring RedisTemplate: after 8 calls method keys hangs up
                            
                                ActionCable: One channel per user
                            
                                How to store array of objects in Redis?
                            
                                Redis/java - writing and reading binary data
                            
                                How to add Redis to a Docker Container?
                            
                                How to limit count of items in Redis sorted sets
                            
                                Redis: Atomic get and conditional set
                            
                                Redis Async / Await Issue in node.js
                            
                                Faster way to iterate all keys and values in redis db
                            
                                Is there a golang redis client that auto detects new shards for pubsub?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Redis cluster live reshard failure

Tags:

redis

redis-cluster

Shlomi Sutton

People also ask

1 Answers

Shudipta Sharma

Recent Activity

Donate For Us