How to increase Redis performance when 100% CPU? Sharding? Fastest .Net Client?

Tags:

Due to massive load increases on our website redis is now struggling with peak load because the redis server instance is reaching 100% CPU (on one of eight cores) resulting in time outs.

We've updated our client software to ServiceStack V3 (coming from BookSleeve 1.1.0.4) and upgraded the redis server to 2.8.11 (coming from 2.4.x). I chose ServiceStack due to the existence of the Harbour.RedisSessionStateStore that uses ServiceStack.Redis. We used AngiesList.Redis before together with BookSleeve, but we experienced 100% with that too.

We have eight redis servers configured as a master/slave tree. One single server for session state tho. The others are for data cache. One master with two master/slaves connected to two slaves each.

The servers hold about 600 client connections at peak when they start to get clogged at 100% CPU.

What can we do to increase performance?

Sharding and/or StackExchange Redis client (no session state client available to my knowledge...).

Or could it be something else? The session server also hits 100% and it is not connected to any other servers (data and network throughput are low).

Update 1: Analysis of redis-cli INFO

Here's the output of the INFO command after one night of running Redis 2.8.

# Server
redis_version:2.8.11
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:7a57b118eb75b37f
redis_mode:standalone
os:Linux 2.6.32-431.11.2.el6.x86_64 x86_64
arch_bits:64
multiplexing_api:epoll
gcc_version:4.4.7
process_id:5843
run_id:d5bb838857d61a9673e36e5bf608fad5a588ac5c
tcp_port:6379
uptime_in_seconds:152778
uptime_in_days:1
hz:10
lru_clock:10765770
config_file:/etc/redis/6379.conf

# Clients
connected_clients:299
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:0

# Memory
used_memory:80266784
used_memory_human:76.55M
used_memory_rss:80719872
used_memory_peak:1079667208
used_memory_peak_human:1.01G
used_memory_lua:33792
mem_fragmentation_ratio:1.01
mem_allocator:jemalloc-3.2.0

# Persistence
loading:0
rdb_changes_since_last_save:70245
rdb_bgsave_in_progress:0
rdb_last_save_time:1403274022
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:0
rdb_current_bgsave_time_sec:-1
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_last_write_status:ok

# Stats
total_connections_received:3375
total_commands_processed:30975281
instantaneous_ops_per_sec:163
rejected_connections:0
sync_full:10
sync_partial_ok:0
sync_partial_err:5
expired_keys:8059370
evicted_keys:0
keyspace_hits:97513
keyspace_misses:46044
pubsub_channels:2
pubsub_patterns:0
latest_fork_usec:22040

# Replication
role:master
connected_slaves:2
slave0:ip=xxx.xxx.xxx.xxx,port=6379,state=online,offset=272643782764,lag=1
slave1:ip=xxx.xxx.xxx.xxx,port=6379,state=online,offset=272643784216,lag=1
master_repl_offset:272643811961
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:272642763386
repl_backlog_histlen:1048576

# CPU
used_cpu_sys:20774.19
used_cpu_user:2458.50
used_cpu_sys_children:304.17
used_cpu_user_children:1446.23

# Keyspace
db0:keys=77863,expires=77863,avg_ttl=3181732
db6:keys=11855,expires=11855,avg_ttl=3126767

Update 2: twemproxy (Sharding)

I've discovered an interesting component called twemproxy. This component, as I understand it, could Shard across multiple redis instances.

Would this help relieve the CPU?

It would save us a lot of programming time, but it would still take some effort to configure 3 extra instances on each server. So I'm hoping somebody can confirm or debunk this solution before we put in the work.

210

asked Jun 19 '14 22:06

baskabas

2 Answers

The first thing to do would be to look at slowlog get 50 (or pick any number of rows) - this shows the last 50 commands that took non-trivial amounts of time. It could be that some of the things you are doing are simply taking too long. I get worried if I see anything in slowlog - I usually see items every few days. If you are seeing lots of items constantly, then: you need to investigate what you are actually doing on the server. One killer thing to never do is keys, but there are other things.

The next thing to do is: cache. Requests that get short-circuited before they hit the back end are free. We use redis extensively, but that doesn't mean we ignore local memory too.

183

answered Oct 28 '22 09:10

Marc Gravell

We found an issue inside our application. Communication about updated data in our cache to the local memory cache was realized through a redis channel subscription.

Every time local cache was flushed, items expired or items were updated messages got sent to all (35) webservers wich in turn started updating more items, etc, etc.

Disabling the messages for the updated keys improved our situation by 10 fold.

Network bandwidth dropped from 1.2 Gbps to 200Mbps and CPU utilization is 40% at 150% the load we had so far at a moment of extreme calculations and updates.

answered Oct 28 '22 08:10

baskabas

Related questions
                            
                                MySQL flush query cache
                            
                                Performance difference: select top 1 order by vs. select min(val)
                            
                                C++11 smart pointers always instead of new/delete?
                            
                                Is there a performance difference between using a namespace vs explicitly calling the class from the namespace?
                            
                                Image varies significantly in its density-independent (dip) size
                            
                                What is the fastest way to read a sequence of images?
                            
                                Transform string from a1b2c3d4 to abcd1234
                            
                                foreach loop List performance difference
                            
                                Lua optimize memory
                            
                                Object becomes null [closed]
                            
                                nested query performance alternatives
                            
                                Fast calculations of the Pareto front in R
                            
                                Enumeration Performance
                            
                                Detect .gif with jquery
                            
                                Node.js CPU load balancing
                            
                                LINQ/IEnumerable Skip().Take() efficiency with used with "yield return"
                            
                                Mysql: Optimize count query on derived table
                            
                                defclass type information for performance
                            
                                How to speed up python networking?
                            
                                MongoDB large index build very slow

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to increase Redis performance when 100% CPU? Sharding? Fastest .Net Client?

Tags:

performance

redis

sharding

stackexchange.redis

servicestack.redis

baskabas

People also ask

2 Answers

Marc Gravell

baskabas

Recent Activity

Donate For Us