Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Redis cluster live reshard failure

We're using redis-cluster extensively in our production env. We currently have a 30 node cluster (15 masters, 15 slaves) We're trying to increase the cluster, for that we've created new servers & joined them to the cluster. so far all is well.

Next - we're trying to reshard the slots to the new masters. we wrote a script that does this, using the redis-trib reshard command.

However - the migration fails midway (but not very far from the start) with this error: [ERR] Calling MIGRATE: ERR Target instance replied with error: BUSYKEY Target key name already exists.

This happens sporadically, at times it manages to move some slots before failing, at times it fails on the first slot. Each such failure requires a manual fixing operation which makes the reshard operation very hard to manage.

We have not found any concrete example of this, nor any idea on how to prevent this other than a downtime migration. which we are trying to avoid.

Versions:
redis server 4.0.2
redis trib 3.3.3 (downgraded from 4.0.2 following this issue : redis cluster reshard [ERR] Calling MIGRATE: ERR Syntax error)

Our next step is to upgrade to latest redis (4.0.11), even though we didn't find any indication in the release notes of this issue.

Hoping to hear we're doing something wrong and how to fix it, or is redis-cluster not built for live resharding ?

Thanks

like image 702
Shlomi Sutton Avatar asked Oct 31 '18 09:10

Shlomi Sutton


People also ask

Can Redis fail?

Redis uses precisely this kind of failure detector. Faulty processes will eventually fail the heartbeat test, and will be (eventually) reported dead. Live processes pass the heartbeat test with some quantifiable likelihood — depending on the distribution of propagation delays within the asynchronous system under study.

Why Redis cluster has 16384 slots?

If you ever wonder why 16384 (or 0 – 16383), Salvatore has a good explanation on this. Basically, Redis chooses 16384 slots because 16384 messages only occupy 2k, while 65535 would require 8k. For its Cluster scale design considerations, the cluster supports a maximum of 1000 shards.

Does Redis support sharding?

Shards support replication. Within a shard, one node functions as the read/write primary node. All the other nodes in a shard function as read-only replicas of the primary node. Redis version 3.2 and later support multiple shards within a cluster (in the API and CLI, a replication group).

Is Redis cluster high availability?

Redis Enterprise is a self-managed, real-time data platform that unlocks the full potential of Redis at scale, ensuring five-nines (5-9s) high availability. Redis Enterprise is architected to provide automated database resilience and mitigate hardware failure and cloud outages risks.


1 Answers

I have faced like this problem while working with redis-clustering support for our own project. I found a problem with the redis-trib reshard command. It works fine if no key is stored in slots those are migrating from one master to another.

But redis-5 (still developing, not stable yet) has it's own `redis-cli' that has no problem with resharding command I think. Only for lower versions of 5 it happens.

If you look at the official docs for redis say redis reconfiguration and redis cluster resharding, you'll find what they do internally to reshard.

So I solved the problem by doing those tasks by running a bash script instead of running redis-trib reshard command.

Suppose you want to reshard some slots from a master node to other master node. We'll call the node that has the current ownership of the hash slot the source node, and the node where we want to migrate the destination node.

For each slot do the following steps:

Remember that the order of these steps is important here according to redis official docs.

  1. Send CLUSTER SETSLOT <slot> IMPORTING <source-node-id> to destination node to set the slot to importing state.
  2. Send CLUSTER SETSLOT <slot> MIGRATING <destination-node-id> to source node to set the slot to migrating state.
  3. Get keys from the source node with CLUSTER GETKEYSINSLOT command and move them into the destination node using the following MIGRATE command.

    MIGRATE target_host target_port key target_database_id timeout

    In Redis Cluster there is no need to specify a database other than 0, but MIGRATE is a general command that can be used for other tasks not involving Redis Cluster.

  4. When the migration process is finally finished, use CLUSTER SETSLOT <slot> NODE <destination-node-id> in both source node and destination node in order to set the slot to their normal state again. The same command is usually sent to all other nodes to avoid waiting for the natural propagation of the new configuration across the cluster.

A simple example bash script to do this is also given here:

source-ip: 172.17.0.5. source-id: 1f70a5107e0042a7d33a9efaf88dbdfecd78076a

destination-ip: 172.17.0.4. destination-id: 7e428bae84697a3882ecad19bd0d13ac7ee97d02

another master ip: 172.17.0.7

for i in `seq 0 5460`; do
    redis-cli -c -h 172.17.0.4 cluster setslot ${i} importing 1f70a5107e0042a7d33a9efaf88dbdfecd78076a
    redis-cli -c -h 172.17.0.5 cluster setslot ${i} migrating 7e428bae84697a3882ecad19bd0d13ac7ee97d02
    while true; do
        key=`redis-cli -c -h 172.17.0.5 cluster getkeysinslot ${i} 1`
        if [ "" = "$key" ]; then
            echo "there are no key in this slot ${i}"
            break
        fi
        redis-cli -h 172.17.0.5 migrate 172.17.0.4 6379 ${key} 0 5000
    done
    redis-cli -c -h 172.17.0.5 cluster setslot ${i} node 7e428bae84697a3882ecad19bd0d13ac7ee97d02
    redis-cli -c -h 172.17.0.4 cluster setslot ${i} node 7e428bae84697a3882ecad19bd0d13ac7ee97d02
    redis-cli -c -h 172.17.0.7 cluster setslot ${i} node 7e428bae84697a3882ecad19bd0d13ac7ee97d02
done
like image 130
Shudipta Sharma Avatar answered Sep 24 '22 16:09

Shudipta Sharma