We're using redis-cluster extensively in our production env. We currently have a 30 node cluster (15 masters, 15 slaves) We're trying to increase the cluster, for that we've created new servers & joined them to the cluster. so far all is well.
Next - we're trying to reshard the slots to the new masters. we wrote a script that does this, using the redis-trib
reshard
command.
However - the migration fails midway (but not very far from the start) with this error:
[ERR] Calling MIGRATE: ERR Target instance replied with error: BUSYKEY Target key name already exists.
This happens sporadically, at times it manages to move some slots before failing, at times it fails on the first slot. Each such failure requires a manual fixing operation which makes the reshard operation very hard to manage.
We have not found any concrete example of this, nor any idea on how to prevent this other than a downtime migration. which we are trying to avoid.
Versions:
redis server 4.0.2
redis trib 3.3.3 (downgraded from 4.0.2 following this issue : redis cluster reshard [ERR] Calling MIGRATE: ERR Syntax error)
Our next step is to upgrade to latest redis (4.0.11), even though we didn't find any indication in the release notes of this issue.
Hoping to hear we're doing something wrong and how to fix it, or is redis-cluster not built for live resharding ?
Thanks
Redis uses precisely this kind of failure detector. Faulty processes will eventually fail the heartbeat test, and will be (eventually) reported dead. Live processes pass the heartbeat test with some quantifiable likelihood — depending on the distribution of propagation delays within the asynchronous system under study.
If you ever wonder why 16384 (or 0 – 16383), Salvatore has a good explanation on this. Basically, Redis chooses 16384 slots because 16384 messages only occupy 2k, while 65535 would require 8k. For its Cluster scale design considerations, the cluster supports a maximum of 1000 shards.
Shards support replication. Within a shard, one node functions as the read/write primary node. All the other nodes in a shard function as read-only replicas of the primary node. Redis version 3.2 and later support multiple shards within a cluster (in the API and CLI, a replication group).
Redis Enterprise is a self-managed, real-time data platform that unlocks the full potential of Redis at scale, ensuring five-nines (5-9s) high availability. Redis Enterprise is architected to provide automated database resilience and mitigate hardware failure and cloud outages risks.
I have faced like this problem while working with redis-clustering support for our own project. I found a problem with the redis-trib reshard
command. It works fine if no key is stored in slots those are migrating from one master to another.
But redis-5 (still developing, not stable yet) has it's own `redis-cli' that has no problem with resharding command I think. Only for lower versions of 5 it happens.
If you look at the official docs for redis say redis reconfiguration and redis cluster resharding, you'll find what they do internally to reshard.
So I solved the problem by doing those tasks by running a bash script instead of running redis-trib reshard
command.
Suppose you want to reshard some slots from a master node to other master node. We'll call the node that has the current ownership of the hash slot the source node, and the node where we want to migrate the destination node.
For each slot do the following steps:
Remember that the order of these steps is important here according to redis official docs.
CLUSTER SETSLOT <slot> IMPORTING <source-node-id>
to destination node to set the slot to importing state.CLUSTER SETSLOT <slot> MIGRATING <destination-node-id>
to source node to set the slot to migrating state.Get keys from the source node with CLUSTER GETKEYSINSLOT command and move them into the destination node using the following MIGRATE command.
MIGRATE target_host target_port key target_database_id timeout
In Redis Cluster there is no need to specify a database other than 0, but MIGRATE is a general command that can be used for other tasks not involving Redis Cluster.
CLUSTER SETSLOT <slot> NODE <destination-node-id>
in both source node and destination node in order to set the slot to their normal state again. The same command is usually sent to all other nodes to avoid waiting for the natural propagation of the new configuration across the cluster.A simple example bash script to do this is also given here:
source-ip: 172.17.0.5
. source-id: 1f70a5107e0042a7d33a9efaf88dbdfecd78076a
destination-ip: 172.17.0.4
. destination-id: 7e428bae84697a3882ecad19bd0d13ac7ee97d02
another master ip: 172.17.0.7
for i in `seq 0 5460`; do
redis-cli -c -h 172.17.0.4 cluster setslot ${i} importing 1f70a5107e0042a7d33a9efaf88dbdfecd78076a
redis-cli -c -h 172.17.0.5 cluster setslot ${i} migrating 7e428bae84697a3882ecad19bd0d13ac7ee97d02
while true; do
key=`redis-cli -c -h 172.17.0.5 cluster getkeysinslot ${i} 1`
if [ "" = "$key" ]; then
echo "there are no key in this slot ${i}"
break
fi
redis-cli -h 172.17.0.5 migrate 172.17.0.4 6379 ${key} 0 5000
done
redis-cli -c -h 172.17.0.5 cluster setslot ${i} node 7e428bae84697a3882ecad19bd0d13ac7ee97d02
redis-cli -c -h 172.17.0.4 cluster setslot ${i} node 7e428bae84697a3882ecad19bd0d13ac7ee97d02
redis-cli -c -h 172.17.0.7 cluster setslot ${i} node 7e428bae84697a3882ecad19bd0d13ac7ee97d02
done
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With