Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

(lettuce) READONLY You can't write against a read only slave

I need some help, Our service uses the lettuce 5.1.6 version, and a total of 22 docker nodes are deployed.
Whenever the service is deployed, several docker nodes will appear ERROR: READONLY You can't write against a read only slave.
Restart the problematic docker node ERROR no longer appears

  • redis server configuration:

8 master 8 slave
stop-writes-on-bgsave-error no
slave-serve-stale-data yes
slave-read-only yes
cluster-enabled yes
cluster-config-file "/data/server/redis-cluster/{port}/conf/node.conf"

  • lettuce configuration:
ClientResources res = DefaultClientResources.builder()
        .commandLatencyPublisherOptions(
                DefaultEventPublisherOptions.builder()
                        .eventEmitInterval(Duration.ofSeconds(5))
                        .build()
        )
        .build();
redisClusterClient = RedisClusterClient.create(res, REDIS_CLUSTER_URI);
redisClusterClient.setOptions(
        ClusterClientOptions.builder()
                .maxRedirects(99)
                .socketOptions(SocketOptions.builder().keepAlive(true).build())
                .topologyRefreshOptions(
                        ClusterTopologyRefreshOptions.builder()
                                .enableAllAdaptiveRefreshTriggers()
                                .build())
                .build());
RedisAdvancedClusterCommands<String, String> command = redisClusterClient.connect().sync();
command.setex("some key", 18000, "some value");
  • The Exception that appears:
io.lettuce.core.RedisCommandExecutionException: READONLY You can't write against a read only slave.
    at io.lettuce.core.ExceptionFactory.createExecutionException(ExceptionFactory.java:135)
    at io.lettuce.core.LettuceFutures.awaitOrCancel(LettuceFutures.java:122)
    at io.lettuce.core.cluster.ClusterFutureSyncInvocationHandler.handleInvocation(ClusterFutureSyncInvocationHandler.java:123)
    at io.lettuce.core.internal.AbstractInvocationHandler.invoke(AbstractInvocationHandler.java:80)
    at com.sun.proxy.$Proxy135.setex(Unknown Source)
    at com.xueqiu.infra.redis4.RedisClusterImpl.lambda$setex$164(RedisClusterImpl.java:1489)
    at com.xueqiu.infra.redis4.RedisClusterImpl$$Lambda$1422/1017847781.apply(Unknown Source)
    at com.xueqiu.infra.redis4.RedisClusterImpl.execute(RedisClusterImpl.java:526)
    at com.xueqiu.infra.redis4.RedisClusterImpl.executeTotal(RedisClusterImpl.java:491)
    at com.xueqiu.infra.redis4.RedisClusterImpl.setex(RedisClusterImpl.java:1489)
like image 526
singgel Avatar asked Jul 20 '20 09:07

singgel


1 Answers

lettuce source code investigation

1 lettuce initialization Partitions.java

    /**
        * Update the partition cache. Updates are necessary after the partition details have changed.
        */
    public void updateCache() {

        synchronized (partitions) {

            if (partitions.isEmpty()) {
                this.slotCache = EMPTY;
                this.nodeReadView = Collections.emptyList();
                return;
            }

            RedisClusterNode[] slotCache = new RedisClusterNode[SlotHash.SLOT_COUNT];
            List<RedisClusterNode> readView = new ArrayList<>(partitions.size());

            for (RedisClusterNode partition: partitions) {

                readView.add(partition);
                for (Integer integer: partition.getSlots()) {
                    slotCache[integer.intValue()] = partition;
                }
            }

            this.slotCache = slotCache;
            this.nodeReadView = Collections.unmodifiableCollection(readView);
        }
    }

2 lettuce send command PooledClusterConnectionProvider.java

    private CompletableFuture<StatefulRedisConnection<K, V>> getWriteConnection(int slot) {

        CompletableFuture<StatefulRedisConnection<K, V>> writer;// avoid races when reconfiguring partitions.
        synchronized (stateLock) {
            writer = writers[slot];
        }

        if (writer == null) {
            RedisClusterNode partition = partitions.getPartitionBySlot(slot);
            if (partition == null) {
                clusterEventListener.onUncoveredSlot(slot);
                return Futures.failed(new PartitionSelectorException("Cannot determine a partition for slot "+ slot + ".",
                        partitions.clone()));
            }

            // Use always host and port for slot-oriented operations. We don't want to get reconnected on a different
            // host because the nodeId can be handled by a different host.
            RedisURI uri = partition.getUri();
            ConnectionKey key = new ConnectionKey(Intent.WRITE, uri.getHost(), uri.getPort());

            ConnectionFuture<StatefulRedisConnection<K, V>> future = getConnectionAsync(key);

            return future.thenApply(connection -> {

                synchronized (stateLock) {
                    if (writers[slot] == ​​null) {
                        writers[slot] = CompletableFuture.completedFuture(connection);
                    }
                }

                return connection;
            }).toCompletableFuture();
        }

        return writer;
    }

The sending principle of lettuce:

  1. Load the topology when the client starts, and store the mapping relationship between slot and node locally in an array structure slotCache
  2. When sending, after calculating the CRC16 of the key, go to the array slotCache through slot to get the corresponding node, and continue to get the connection of this node
  3. Note that basically in all middleware of this cluster mode, the logic of the client is to obtain the network topology of the server, and then calculate the mapping logic on the client, Compare the performance analysis of Kafka across computer rooms:

redis cluster information troubleshooting

./bin/redis-cli -h 10.10.28.2 -p 25661 cluster info

cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:6
cluster_size: 3
cluster_current_epoch:8
cluster_my_epoch:6
cluster_stats_messages_ping_sent:615483
cluster_stats_messages_pong_sent:610194
cluster_stats_messages_meet_sent:3
cluster_stats_messages_fail_sent:8
cluster_stats_messages_auth-req_sent:5
cluster_stats_messages_auth-ack_sent:2
cluster_stats_messages_update_sent:4
cluster_stats_messages_sent:1225699
cluster_stats_messages_ping_received:610188
cluster_stats_messages_pong_received:603593
cluster_stats_messages_meet_received:2
cluster_stats_messages_fail_received:4
cluster_stats_messages_auth-req_received:2
cluster_stats_messages_auth-ack_received:2
cluster_stats_messages_received:1213791

./bin/redis-cli -h 10.10.28.2 -p 25661 cluster nodes

5e9d0c185a2ba2fc9564495730c874bea76c15fa 10.10.28.3:25662@35662 slave 2281f330d771ee682221bc6c239afd68e6f20571 0 1595921769000 15 connected
79cb673db12199c32737b959cd82ec9963106558 10.10.25.2:25651@35651 master - 0 1595921770000 18 connected 4096-6143
2281f330d771ee682221bc6c239afd68e6f20571 10.10.28.2:25661@35661 myself,master - 0 1595921759000 15 connected 10240-12287
6a9ea568d6b49360afbb650c712bd7920403ba19 10.10.28.3:25686@35686 master - 0 1595921769000 14 connected 12288-14335
5a12dd423370e6f4085e593f9cd0b3a4ddfa9757 10.10.27.2:25656@35656 master - 0 1595921771000 13 connected 14336-16383
f5148dba1127bd9bada8ecc39341a0b72ef25d8e 10.10.25.3:25652@35652 slave 79cb673db12199c32737b959cd82ec9963106558 0 1595921769000 18 connected
f6788b4829e601642ed4139548153830c430b932 10.10.26.3:25666@35666 master - 0 1595921769870 16 connected 8192-10239
f54cfebc12c69725f471d16133e7ca3a8567dc18 10.10.28.15:25687@35687 slave 6a9ea568d6b49360afbb650c712bd7920403ba19 0 1595921763000 14 connected
f09ad21effff245cae23c024a8a886f883634f5c 10.10.28.15:25667@35667 slave f6788b4829e601642ed4139548153830c430b932 0 1595921770870 16 connected
ff5f5a56a7866f32e84ec89482aabd9ca1f05e20 10.10.25.3:25681@35681 master - 0 1595921773876 0 connected 0-2047
19c57214e4293b2e37d881534dcd55318fa96a70 10.10.50.16:25677@35677 slave 5f677e012808b09c67316f6ac5bdf0ec005cd598 0 1595921768000 17 connected
d8b4f99e0f9961f2e866b92e7351760faa3e0f2b 10.10.30.9:25671@35671 master - 0 1595921773000 6 connected 2048-4095
068e3bc73c27782c49782d30b66aa8b1140666ce 10.10.27.3:25682@35682 slave ff5f5a56a7866f32e84ec89482aabd9ca1f05e20 0 1595921771872 12 connected
e8b0311aeec4e3d285028abc377f0c277f9a5c74 10.10.49.9:25672@35672 slave d8b4f99e0f9961f2e866b92e7351760faa3e0f2b 0 1595921770000 6 connected
f03bc2ca91b3012f4612ecbc8c611c9f4a0e1305 10.10.27.3:25657@35657 slave 5a12dd423370e6f4085e593f9cd0b3a4ddfa9757 0 1595921762000 13 connected
5f677e012808b09c67316f6ac5bdf0ec005cd598 10.10.50.7:25676@35676 master - 0 1595921772873 17 connected 6144-8191

./bin/redis-cli -h 10.10.28.3 -p 25662 cluster nodes

f5148dba1127bd9bada8ecc39341a0b72ef25d8e 10.10.25.3:25652@35652 slave 79cb673db12199c32737b959cd82ec9963106558 0 1595921741000 18 connected
f6788b4829e601642ed4139548153830c430b932 10.10.26.3:25666@35666 master - 0 1595921744000 16 connected 8192-10239
f03bc2ca91b3012f4612ecbc8c611c9f4a0e1305 10.10.27.3:25657@35657 slave 5a12dd423370e6f4085e593f9cd0b3a4ddfa9757 0 1595921740000 13 connected
5f677e012808b09c67316f6ac5bdf0ec005cd598 10.10.50.7:25676@35676 master - 0 1595921743127 17 connected 6144-8191
79cb673db12199c32737b959cd82ec9963106558 10.10.25.2:25651@35651 master - 0 1595921743000 18 connected 4096-6143
2281f330d771ee682221bc6c239afd68e6f20571 10.10.28.2:25661@35661 master - 0 1595921744129 15 connected 10240-12287
f09ad21effff245cae23c024a8a886f883634f5c 10.10.28.15:25667@35667 slave f6788b4829e601642ed4139548153830c430b932 0 1595921740000 16 connected
f54cfebc12c69725f471d16133e7ca3a8567dc18 10.10.28.15:25687@35687 slave 6a9ea568d6b49360afbb650c712bd7920403ba19 0 1595921745130 14 connected
5e9d0c185a2ba2fc9564495730c874bea76c15fa 10.10.28.3:25662@35662 myself,slave 2281f330d771ee682221bc6c239afd68e6f20571 0 1595921733000 5 connected 0-1820
068e3bc73c27782c49782d30b66aa8b1140666ce 10.10.27.3:25682@35682 slave ff5f5a56a7866f32e84ec89482aabd9ca1f05e20 0 1595921744000 12 connected
d8b4f99e0f9961f2e866b92e7351760faa3e0f2b 10.10.30.9:25671@35671 master - 0 1595921739000 6 connected 2048-4095
5a12dd423370e6f4085e593f9cd0b3a4ddfa9757 10.10.27.2:25656@35656 master - 0 1595921742000 13 connected 14336-16383
ff5f5a56a7866f32e84ec89482aabd9ca1f05e20 10.10.25.3:25681@35681 master - 0 1595921746131 0 connected 1821-2047
6a9ea568d6b49360afbb650c712bd7920403ba19 10.10.28.3:25686@35686 master - 0 1595921747133 14 connected 12288-14335
19c57214e4293b2e37d881534dcd55318fa96a70 10.10.50.16:25677@35677 slave 5f677e012808b09c67316f6ac5bdf0ec005cd598 0 1595921742126 17 connected
e8b0311aeec4e3d285028abc377f0c277f9a5c74 10.10.49.9:25672@35672 slave d8b4f99e0f9961f2e866b92e7351760faa3e0f2b 0 1595921745000 6 connected

./bin/redis-cli -h 10.10.49.9 -p 25672 cluster nodes

d8b4f99e0f9961f2e866b92e7351760faa3e0f2b 10.10.30.9:25671@35671 master - 0 1595921829000 6 connected 2048-4095
79cb673db12199c32737b959cd82ec9963106558 10.10.25.2:25651@35651 master - 0 1595921830000 18 connected 4096-6143
ff5f5a56a7866f32e84ec89482aabd9ca1f05e20 10.10.25.3:25681@35681 master - 0 1595921830719 0 connected 0-1820
f54cfebc12c69725f471d16133e7ca3a8567dc18 10.10.28.15:25687@35687 slave 6a9ea568d6b49360afbb650c712bd7920403ba19 0 1595921827000 14 connected
5f677e012808b09c67316f6ac5bdf0ec005cd598 10.10.50.7:25676@35676 master - 0 1595921827000 17 connected 6144-8191
2281f330d771ee682221bc6c239afd68e6f20571 10.10.28.2:25661@35661 master - 0 1595921822000 15 connected 10240-12287
5e9d0c185a2ba2fc9564495730c874bea76c15fa 10.10.28.3:25662@35662 slave 2281f330d771ee682221bc6c239afd68e6f20571 0 1595921828714 15 connected
068e3bc73c27782c49782d30b66aa8b1140666ce 10.10.27.3:25682@35682 slave ff5f5a56a7866f32e84ec89482aabd9ca1f05e20 0 1595921832721 12 connected
6a9ea568d6b49360afbb650c712bd7920403ba19 10.10.28.3:25686@35686 master - 0 1595921825000 14 connected 12288-14335
f5148dba1127bd9bada8ecc39341a0b72ef25d8e 10.10.25.3:25652@35652 slave 79cb673db12199c32737b959cd82ec9963106558 0 1595921830000 18 connected
19c57214e4293b2e37d881534dcd55318fa96a70 10.10.50.16:25677@35677 slave 5f677e012808b09c67316f6ac5bdf0ec005cd598 0 1595921829716 17 connected
e8b0311aeec4e3d285028abc377f0c277f9a5c74 10.10.49.9:25672@35672 myself,slave d8b4f99e0f9961f2e866b92e7351760faa3e0f2b 0 1595921832000 4 connected 1821-2047
f09ad21effff245cae23c024a8a886f883634f5c 10.10.28.15:25667@35667 slave f6788b4829e601642ed4139548153830c430b932 0 1595921826711 16 connected
f03bc2ca91b3012f4612ecbc8c611c9f4a0e1305 10.10.27.3:25657@35657 slave 5a12dd423370e6f4085e593f9cd0b3a4ddfa9757 0 1595921829000 13 connected
f6788b4829e601642ed4139548153830c430b932 10.10.26.3:25666@35666 master - 0 1595921831720 16 connected 8192-10239
5a12dd423370e6f4085e593f9cd0b3a4ddfa9757 10.10.27.2:25656@35656 master - 0 1595921827714 13 connected 14336-16383

./bin/redis-trib.rb check 10.10.30.9:25671

>>> Performing Cluster Check (using node 10.10.30.9:25671)
M: d8b4f99e0f9961f2e866b92e7351760faa3e0f2b 10.10.30.9:25671
   slots:2048-4095 (2048 slots) master
   1 additional replica(s)
S: e8b0311aeec4e3d285028abc377f0c277f9a5c74 10.10.49.9:25672
   slots: (0 slots) slave
   ········
   ········
S: f03bc2ca91b3012f4612ecbc8c611c9f4a0e1305 10.10.27.3:25657
   slots: (0 slots) slave
   replicates 5a12dd423370e6f4085e593f9cd0b3a4ddfa9757
M: 5a12dd423370e6f4085e593f9cd0b3a4ddfa9757 10.10.27.2:25656
   slots:14336-16383 (2048 slots) master
   1 additional replica(s)
[ERR] Nodes don't agree about configuration!
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

Be suspicious of everything, be vigilant, diligence can make up for one’s weaknesses

  1. At the beginning, I suspected that the cluster is healthy, but due to online phenomena, most of the nodes are normal, and a few do not have problems, restart the problem fix
  2. In the beginning, the information was checked through normal nodes, and no problems were found. Even if the topological information of several nodes was inconsistent through the logs at the beginning, it would be difficult to see the problem without the mapping relationship of the client.
  3. Comparison found that some slots are mapped to slave nodes
  4. After executing the check, it is found that there is a problem with the cluster, which is explained in the following open source related issues
like image 123
singgel Avatar answered Nov 03 '22 13:11

singgel