Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find the root cause of high CPU usage of Kafka brokers?

Tags:

apache-kafka

I am in charge of operating two kafka clusters (one for prod and one for our dev environment). The setup is mostly similiar, but the dev environment has no SASL/SSL setup and uses just 4 instead of 8 brokers. Each broker is assigned to a dedicated google kubernetes node with 4 vCPU and 26GB RAM.

On our dev environment we've got roughly 1000 messages in / sec and each of the 4 brokers uses pretty consistently 3 out of the 4 available CPU cores (75% CPU usage).

On our prod environment we got about 1500 messages in / sec and the CPU usage is also 3 out of 4 cores there.

It seems that CPU usage is at least the bottleneck for us and I'd like to know how I can perform a CPU profiling, so that I know what exactly is causing the high cpu usage. Since it's relatively consistent I guess it could be our snappy compression.

I am interested in all ideas how I could investigate the cause of the high cpu usage and how I could tweak that in my cluster.

  • Apache Kafka version: 2.1 (CPU load used to be similiar on Kafka 0.11.x too)

  • Dev Cluster (Snappy compression, no SASL/SSL, 4 Brokers): 1000 messages in / sec, 3 CPU cores consistent usage

  • Prod cluster (Snappy compression, SASL/SSL, 8 Brokers): 1500 messages in / sec, 3 CPU cores consistent usage

Side note: I already made sure producers produce their messages snappy compressed. I have access to all JMX metrics, couldn't find anything useful for figuring out the CPU usage though.

I already have metrics attached to my prometheus (this is where I got the CPU usage stats from too). The problem is that the container's CPU usage doesn't tell me WHY it is that high. I need more granularity e. g. what are CPU cycles being spent on (compression? broker communication? sasl/ssl?).

like image 721
kentor Avatar asked Mar 01 '19 22:03

kentor


People also ask

Is Kafka memory intensive or CPU intensive?

Most Kafka deployments tend to be rather light on CPU requirements. As such, the exact processor setup matters less than the other resources. Note that if SSL is enabled, the CPU requirements can be significantly higher (the exact details depend on the CPU type and JVM implementation).

How do I check a broker in Kafka?

There are 2 ways to get the list of available brokers in a Kafka cluster. Both with the help of scripts from zookeeper. Zookeeper manages the leader election and other coordination things for a Kafka cluster. So Zookeeper has a list of all the Kafka brokers in the cluster.

What happens when a broker goes down in Kafka?

During a broker outage, all partition replicas on the broker become unavailable, so the affected partitions' availability is determined by the existence and status of their other replicas. If a partition has no additional replicas, the partition becomes unavailable.

How many Kafka brokers can a cluster maximally have?

A Kafka cluster can have, 10, 100, or 1,000 brokers in a cluster if needed.


1 Answers

If you have access to JMX metrics you are almost done for profiling CPU. All thing have to do is installing Prometheus and Grafana and then store metrics in Prometheus and monitor them with Grafana. You can find complete steps in Monitoring Kafka

Grafana Dashboard for cluster monitoring

Note: If you are suspicious about snappy compression, maybe this performance test can help you

Update:

Based on Confluent, most of the CPU usage is because of SSL.

Note that if SSL is enabled, the CPU requirements can be significantly higher (the exact details depend on the CPU type and JVM implementation).

You should choose a modern processor with multiple cores. Common clusters utilize 24 core machines.

If you need to choose between faster CPUs or more cores, choose more cores. The extra concurrency that multiple cores offers will far outweigh a slightly faster clock speed.

like image 166
Amin Avatar answered Sep 17 '22 08:09

Amin