Redis cluster performance - high timeout rate on low load

Tags:

See strange behavior of redis cluster, which works totally fine on big load and starts to run with 50% timeout rate and unstable response times on low load.

We have same patter each day on periods of low load.

Any ideas what could cause such a strange pattern? Maybe some maintenance work this RedisCluster starts to do on low load time? Like slots rebalancing. Please recommend any settings or aspects to check.

Versions: Redis 2.0.7, Jedis 2.8.1

Configuration: 3 physical nodes with 9 master processes and 18 slaves.

JedisCluster Timeout = 5ms.

Load is 100% writes with setex.

JedisCluster response time JedisCluster timeout rate

This graphs are for JedisCluster response times, not actual RedisCluster times. "Sets" line here is successful sets actually, not total count.

360

asked Apr 12 '16 21:04

Dmitry Spikhalskiy

1 Answers

Finally I found that it looks like network issue.

redis08(10.201.12.214) ~ $ redis-benchmark -h 10.201.12.215 -p 9006
====== PING_INLINE ======
  100000 requests completed in 91.42 seconds
  50 parallel clients
  3 bytes payload
  keep alive: 1

0.00% <= 11 milliseconds

redis09(10.201.12.215) ~ $ redis-benchmark -h 10.201.12.215 -p 9006
====== PING_INLINE ======
  100000 requests completed in 1.41 seconds
  50 parallel clients
  3 bytes payload
  keep alive: 1

99.46% <= 1 milliseconds

redis08 ~ $ ping lga-redis09
PING redis09 (10.201.12.215) 56(84) bytes of data.
64 bytes from redis09 (10.201.12.215): icmp_seq=1 ttl=64 time=10.7 ms

Looking at collectd's "if_octets" we have enormous network activity on network interfaces on this time of low write activity. Nighttime load is like 10x in comparison with daytime load.

And it is caused by redis nodes which start to actively exchange information on this low load period. Iptraf top connections output: Iptraf output, most packets and traffic are between redis nodes/processes itself While on daytime top in this iptraf report belongs fully to actual redis clients with good write load.

Finally found that we have issues with replication. Sometimes buffer was not enough and slaves started full resync. Looks like this night load - full resync attempts + low repl-timeout value - neverending replication attempts as a result. Why this replication affects low night load so significantly and didn't affect day time - I don't know, see no options that make redis do more often attempts on nights or something like that. If it's interesting, we fixed neverending replication by increasing obvious settings:

repl-backlog-size
repl-timeout

169

answered Oct 29 '22 16:10

Dmitry Spikhalskiy

Related questions
                            
                                Spring Cloud microservices memory usage
                            
                                WebWorkers execution appears to be much slower than the main thread
                            
                                How to efficiently transpose a 2D bit matrix
                            
                                MOVSD performance depends on arguments
                            
                                Efficient way to implement LinkedIn like "How you are connected to" feature?
                            
                                Can making a method static improve performance, and under what circumstances?
                            
                                Performance Penalties for Unused Joins
                            
                                Why default buffer size is 8k in Java IO?
                            
                                big array manipulation is very slow in ruby
                            
                                Linux Shared Memory Synchronization
                            
                                Does Inheritance in implicit value classes introduce an overhead?
                            
                                CPU usage not maximized and high synchronization in server app relying on async/await
                            
                                How to determine gc-cpu utilization within an application?
                            
                                What heuristic uses TPL to determine when to use multiple cores
                            
                                parallelizing matrix multiplication through threading and SIMD
                            
                                Does the branch predictor kick in with this?
                            
                                Performance when highlighting item in list with React
                            
                                How to use Modulo efficiently?
                            
                                How can I implement AMP in angular/ionic website?
                            
                                Python Pandas: Convert 2,000,000 DataFrame rows to Binary Matrix (pd.get_dummies()) without memory error?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Redis cluster performance - high timeout rate on low load

Tags:

performance

redis

jedis

redis-cluster

Dmitry Spikhalskiy

People also ask

1 Answers

Dmitry Spikhalskiy

Recent Activity

Donate For Us