Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

100% cpu usage by all kafka brokers

Tags:

apache-kafka

Cross-posting here from https://issues.apache.org/jira/browse/KAFKA-7925 since no one has replied there yet.

Issue: I am seeing constant 100% cpu usage on all brokers in our kafka cluster even without any clients connected to any broker. When this happens no client is able to connect to kafka brokers and they keep timing out. I keep seeing below exception in server logs:

It is becoming a blocker for the deployment now.

Exception

java.net.SocketTimeoutException: Failed to connect within 30000 ms
at kafka.server.ReplicaFetcherBlockingSend.sendRequest(ReplicaFetcherBlockingSend.scala:93)
at kafka.server.ReplicaFetcherThread.fetchFromLeader(ReplicaFetcherThread.scala:190)
at kafka.server.AbstractFetcherThread.kafka$server$AbstractFetcherThread$$processFetchRequest(AbstractFetcherThread.scala:241)
at kafka.server.AbstractFetcherThread$$anonfun$maybeFetch$1.apply(AbstractFetcherThread.scala:130)
at kafka.server.AbstractFetcherThread$$anonfun$maybeFetch$1.apply(AbstractFetcherThread.scala:129)
at scala.Option.foreach(Option.scala:257)
at kafka.server.AbstractFetcherThread.maybeFetch(AbstractFetcherThread.scala:129)
at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:111)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)
[2019-02-14 09:20:00,617] INFO [ReplicaFetcher replicaId=1, leaderId=6, fetcherId=0] Error sending fetch request (sessionId=841897464, epoch=INITIAL) to node 6: java.net.SocketTimeoutException: Failed to connect within 30000 ms. (org.apache.kafka.clients.FetchSessionHandler)

I am seeing lot of connections to other brokers in CLOSE_WAIT state (see below). In thread usage, I am seeing these threads 'kafka-network-thread-6-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-0,kafka-network-thread-6-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-1,kafka-network-thread-6-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-2' taking up more than 90% of the cpu time in a 60s interval.

Setup details:

Java version:
openjdk 11.0.2 2019-01-15
OpenJDK Runtime Environment 18.9 (build 11.0.2+9)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.2+9, mixed mode)
Kafka verison: v2.1.0

We have kerberos authentication and simple acl based authorization setup in the cluster.

connections:

java 144319 kafkagod 88u IPv4 3063266 0t0 TCP *:35395 (LISTEN)
java 144319 kafkagod 89u IPv4 3063267 0t0 TCP *:9144 (LISTEN)
java 144319 kafkagod 104u IPv4 3064219 0t0 TCP mwkafka-prod-02.tbd:47292->mwkafka-zk-prod-05.tbd:2181 (ESTABLISHED)
java 144319 kafkagod 2003u IPv4 3055115 0t0 TCP *:9092 (LISTEN)
java 144319 kafkagod 2013u IPv4 7220110 0t0 TCP mwkafka-prod-02.tbd:60724->mwkafka-zk-prod-04.dr:2181 (ESTABLISHED)
java 144319 kafkagod 2020u IPv4 30012904 0t0 TCP mwkafka-prod-02.tbd:38988->mwkafka-prod-02.nyc:9092 (ESTABLISHED)
java 144319 kafkagod 2021u IPv4 30012961 0t0 TCP mwkafka-prod-02.tbd:58420->mwkafka-prod-01.nyc:9092 (ESTABLISHED)
java 144319 kafkagod 2027u IPv4 30015723 0t0 TCP mwkafka-prod-02.tbd:58398->mwkafka-prod-01.nyc:9092 (ESTABLISHED)
java 144319 kafkagod 2028u IPv4 30015630 0t0 TCP mwkafka-prod-02.tbd:36248->mwkafka-prod-02.dr:9092 (ESTABLISHED)
java 144319 kafkagod 2030u IPv4 30015726 0t0 TCP mwkafka-prod-02.tbd:39012->mwkafka-prod-02.nyc:9092 (ESTABLISHED)
java 144319 kafkagod 2031u IPv4 30013619 0t0 TCP mwkafka-prod-02.tbd:38986->mwkafka-prod-02.nyc:9092 (ESTABLISHED)
java 144319 kafkagod 2032u IPv4 30015604 0t0 TCP mwkafka-prod-02.tbd:36246->mwkafka-prod-02.dr:9092 (ESTABLISHED)
java 144319 kafkagod 2033u IPv4 30012981 0t0 TCP mwkafka-prod-02.tbd:36924->mwkafka-prod-01.dr:9092 (ESTABLISHED)
java 144319 kafkagod 2034u IPv4 30012967 0t0 TCP mwkafka-prod-02.tbd:39036->mwkafka-prod-02.nyc:9092 (ESTABLISHED)
java 144319 kafkagod 2035u IPv4 30012898 0t0 TCP mwkafka-prod-02.tbd:36866->mwkafka-prod-01.dr:9092 (FIN_WAIT2)
java 144319 kafkagod 2036u IPv4 30004729 0t0 TCP mwkafka-prod-02.tbd:36882->mwkafka-prod-01.dr:9092 (ESTABLISHED)
java 144319 kafkagod 2037u IPv4 30004914 0t0 TCP mwkafka-prod-02.tbd:58426->mwkafka-prod-01.nyc:9092 (ESTABLISHED)
java 144319 kafkagod 2038u IPv4 30015651 0t0 TCP mwkafka-prod-02.tbd:36884->mwkafka-prod-01.dr:9092 (ESTABLISHED)
java 144319 kafkagod 2039u IPv4 30012966 0t0 TCP mwkafka-prod-02.tbd:58422->mwkafka-prod-01.nyc:9092 (ESTABLISHED)
java 144319 kafkagod 2040u IPv4 30005643 0t0 TCP mwkafka-prod-02.tbd:36252->mwkafka-prod-02.dr:9092 (ESTABLISHED)
java 144319 kafkagod 2041u IPv4 30012944 0t0 TCP mwkafka-prod-02.tbd:36286->mwkafka-prod-02.dr:9092 (ESTABLISHED)
java 144319 kafkagod 2042u IPv4 30012973 0t0 TCP mwkafka-prod-02.tbd:9092->mwkafka-prod-01.nyc:51924 (ESTABLISHED)
java 144319 kafkagod 2043u sock 0,7 0t0 30012463 protocol: TCP
java 144319 kafkagod 2044u IPv4 30012979 0t0 TCP mwkafka-prod-02.tbd:9092->mwkafka-prod-01.dr:39994 (ESTABLISHED)
java 144319 kafkagod 2045u IPv4 30012899 0t0 TCP mwkafka-prod-02.tbd:9092->mwkafka-prod-02.nyc:34548 (ESTABLISHED)
java 144319 kafkagod 2046u sock 0,7 0t0 30003437 protocol: TCP
java 144319 kafkagod 2047u IPv4 30012980 0t0 TCP mwkafka-prod-02.tbd:9092->mwkafka-prod-02.dr:38120 (ESTABLISHED)
java 144319 kafkagod 2048u sock 0,7 0t0 30012546 protocol: TCP
java 144319 kafkagod 2049u IPv4 30005418 0t0 TCP mwkafka-prod-02.tbd:9092->mwkafka-prod-01.dr:39686 (CLOSE_WAIT)
java 144319 kafkagod 2050u IPv4 30009977 0t0 TCP mwkafka-prod-02.tbd:9092->mwkafka-prod-02.nyc:34552 (ESTABLISHED)
java 144319 kafkagod 2060u sock 0,7 0t0 30003439 protocol: TCP
java 144319 kafkagod 2061u IPv4 30012906 0t0 TCP mwkafka-prod-02.tbd:9092->mwkafka-prod-01.nyc:51862 (ESTABLISHED)
java 144319 kafkagod 2069u IPv4 30005642 0t0 TCP mwkafka-prod-02.tbd:9092->mwkafka-prod-02.nyc:34570 (ESTABLISHED)
java 144319 kafkagod 2073u sock 0,7 0t0 30003440 protocol: TCP
java 144319 kafkagod 2086u IPv4 30005644 0t0 TCP mwkafka-prod-02.tbd:9092->mwkafka-prod-01.nyc:51878 (ESTABLISHED)
java 144319 kafkagod 2090u sock 0,7 0t0 30012553 protocol: TCP
java 144319 kafkagod 2093u sock 0,7 0t0 30012502 protocol: TCP
java 144319 kafkagod 2097u sock 0,7 0t0 30012531 protocol: TCP
java 144319 kafkagod 2104u IPv4 30005670 0t0 TCP mwkafka-prod-02.tbd:9092->mwkafka-prod-02.nyc:34646 (ESTABLISHED)
java 144319 kafkagod 2105u IPv4 30012933 0t0 TCP mwkafka-prod-02.tbd:9092->mwkafka-prod-02.dr:38106 (ESTABLISHED)
java 144319 kafkagod 2106u IPv4 30012565 0t0 TCP mwkafka-prod-02.tbd:9092->mwkafka-prod-02.nyc:34366 (CLOSE_WAIT)
java 144319 kafkagod 2114u IPv4 30012958 0t0 TCP mwkafka-prod-02.tbd:9092->mwkafka-prod-01.dr:39970 (ESTABLISHED)
java 144319 kafkagod 2115u sock 0,7 0t0 30012569 protocol: TCP
java 144319 kafkagod 2117u sock 0,7 0t0 30012571 protocol: TCP
java 144319 kafkagod 2118u IPv4 30012959 0t0 TCP mwkafka-prod-02.tbd:9092->mwkafka-prod-01.dr:39972 (ESTABLISHED)
java 144319 kafkagod 2120u IPv4 30012575 0t0 TCP mwkafka-prod-02.tbd:9092->mwkafka-prod-02.dr:37868 (CLOSE_WAIT)
java 144319 kafkagod 2121u IPv4 30012960 0t0 TCP mwkafka-prod-02.tbd:9092->mwkafka-prod-01.dr:39974 (ESTABLISHED)
java 144319 kafkagod 2122u IPv4 30012577 0t0 TCP mwkafka-prod-02.tbd:9092->mwkafka-prod-01.dr:39704 (CLOSE_WAIT)
java 144319 kafkagod 2127u IPv4 29477410 0t0 TCP mwkafka-prod-02.tbd:58804->u-sonar-sonarsec.sdlb:8826 (ESTABLISHED)
java 144319 kafkagod 2128u IPv4 30012579 0t0 TCP mwkafka-prod-02.tbd:9092->mwkafka-prod-01.dr:39708 (CLOSE_WAIT)
java 144319 kafkagod 2129u IPv4 30012962 0t0 TCP mwkafka-prod-02.tbd:9092->mwkafka-prod-02.dr:38110 (ESTABLISHED)
java 144319 kafkagod 2130u IPv4 30012582 0t0 TCP mwkafka-prod-02.tbd:9092->mwkafka-prod-02.dr:37872 (CLOSE_WAIT)
java 144319 kafkagod 2132u IPv4 30012963 0t0 TCP mwkafka-prod-02.tbd:9092->mwkafka-prod-02.dr:38112 (ESTABLISHED)
java 144319 kafkagod 2133u IPv4 30012602 0t0 TCP mwkafka-prod-02.tbd:9092->mwkafka-prod-01.nyc:51622 (CLOSE_WAIT)
java 144319 kafkagod 2135u IPv4 30012964 0t0 TCP mwkafka-prod-02.tbd:9092->mwkafka-prod-01.nyc:51916 (ESTABLISHED)
java 144319 kafkagod 2136u IPv4 30012605 0t0 TCP mwkafka-prod-02.tbd:9092->mwkafka-prod-01.nyc:51626 (CLOSE_WAIT)
java 144319 kafkagod 2139u IPv4 30012965 0t0 TCP mwkafka-prod-02.tbd:9092->mwkafka-prod-01.nyc:51918 (ESTABLISHED)
java 144319 kafkagod 2140u IPv4 30012607 0t0 TCP mwkafka-prod-02.tbd:9092->mwkafka-prod-01.dr:39754 (CLOSE_WAIT)
java 144319 kafkagod 2141u IPv4 30010735 0t0 TCP mwkafka-prod-02.tbd:9092->mwkafka-prod-02.dr:37878 (CLOSE_WAIT)
java 144319 kafkagod 2144u IPv4 30010741 0t0 TCP mwkafka-prod-02.tbd:9092->mwkafka-prod-02.nyc:34402 (CLOSE_WAIT)
java 144319 kafkagod 2145u IPv4 30010742 0t0 TCP mwkafka-prod-02.tbd:9092->mwkafka-prod-01.nyc:51648 (CLOSE_WAIT)
java 144319 kafkagod 2149u IPv4 30012623 0t0 TCP mwkafka-prod-02.tbd:9092->mwkafka-prod-01.nyc:51658 (CLOSE_WAIT)
java 144319 kafkagod 2152u IPv4 30012625 0t0 TCP mwkafka-prod-02.tbd:9092->mwkafka-prod-02.nyc:34416 (CLOSE_WAIT)
java 144319 kafkagod 2155u IPv4 30012635 0t0 TCP mwkafka-prod-02.tbd:9092->mwkafka-prod-01.dr:39778 (CLOSE_WAIT)
java 144319 kafkagod 2157u IPv4 30012636 0t0 TCP mwkafka-prod-02.tbd:9092->mwkafka-prod-01.dr:39780 (CLOSE_WAIT)
java 144319 kafkagod 2162u IPv4 29630161 0t0 TCP mwkafka-prod-02.tbd:45254->u-sonar-sonarpri.sdlb:8826 (ESTABLISHED)
java 144319 kafkagod 2165u IPv4 30012639 0t0 TCP mwkafka-prod-02.tbd:9092->mwkafka-prod-02.dr:37916 (CLOSE_WAIT)
java 144319 kafkagod 2168u IPv4 30012640 0t0 TCP mwkafka-prod-02.tbd:9092->mwkafka-prod-02.dr:37918 (CLOSE_WAIT)
java 144319 kafkagod 2169u sock 0,7 0t0 30006888 protocol: TCP
java 144319 kafkagod 2172u IPv4 30012656 0t0 TCP mwkafka-prod-02.tbd:9092->mwkafka-prod-01.nyc:51714 (CLOSE_WAIT)
java 144319 kafkagod 2173u IPv4 30012659 0t0 TCP mwkafka-prod-02.tbd:9092->mwkafka-prod-01.nyc:51718 (CLOSE_WAIT)
java 144319 kafkagod 2176u sock 0,7 0t0 30006891 protocol: TCP
java 144319 kafkagod 2179u sock 0,7 0t0 30012426 protocol: TCP
java 144319 kafkagod 2180u sock 0,7 0t0 30012427 protocol: TCP
java 144319 kafkagod 2183u sock 0,7 0t0 30012429 protocol: TCP
java 144319 kafkagod 2184u sock 0,7 0t0 30012432 protocol: TCP
java 144319 kafkagod 2186u sock 0,7 0t0 30012437 protocol: TCP
java 144319 kafkagod 2187u sock 0,7 0t0 30012459 protocol: TCP
java 144319 kafkagod 2188u sock 0,7 0t0 30012696 protocol: TCP
java 144319 kafkagod 2189u IPv4 30012718 0t0 TCP mwkafka-prod-02.tbd:9092->mwkafka-prod-02.nyc:34436 (CLOSE_WAIT)
java 144319 kafkagod 2191u sock 0,7 0t0 30012720 protocol: TCP
java 144319 kafkagod 2192u IPv4 30009662 0t0 TCP mwkafka-prod-02.tbd:9092->mwkafka-prod-02.nyc:34456 (CLOSE_WAIT)
java 144319 kafkagod 2193u sock 0,7 0t0 30009663 protocol: TCP
java 144319 kafkagod 2195u sock 0,7 0t0 30012723 protocol: TCP
java 144319 kafkagod 2196u IPv4 30012727 0t0 TCP mwkafka-prod-02.tbd:9092->mwkafka-prod-02.dr:37958 (CLOSE_WAIT)
java 144319 kafkagod 2197u sock 0,7 0t0 30012791 protocol: TCP
java 144319 kafkagod 2198u IPv4 30012808 0t0 TCP mwkafka-prod-02.tbd:9092->mwkafka-prod-01.dr:39818 (CLOSE_WAIT)
java 144319 kafkagod 2199u IPv4 30012818 0t0 TCP mwkafka-prod-02.tbd:9092->mwkafka-prod-01.dr:39838 (CLOSE_WAIT)
java 144319 kafkagod 2200u IPv4 30012836 0t0 TCP mwkafka-prod-02.tbd:9092->mwkafka-prod-02.dr:37980 (CLOSE_WAIT)
java 144319 kafkagod 2201u IPv4 30012839 0t0 TCP mwkafka-prod-02.tbd:9092->mwkafka-prod-02.dr:37986 (CLOSE_WAIT)
java 144319 kafkagod 2202u IPv4 30012866 0t0 TCP mwkafka-prod-02.tbd:9092->mwkafka-prod-01.nyc:51812 (CLOSE_WAIT)
java 144319 kafkagod 2204u IPv4 30012867 0t0 TCP mwkafka-prod-02.tbd:9092->mwkafka-prod-01.nyc:51814 (CLOSE_WAIT)
java 144319 kafkagod 2205u IPv4 30012872 0t0 TCP mwkafka-prod-02.tbd:9092->mwkafka-prod-01.nyc:51846 (CLOSE_WAIT)
java 144319 kafkagod 2206u IPv4 30012873 0t0 TCP mwkafka-prod-02.tbd:9092->mwkafka-prod-01.dr:39890 (CLOSE_WAIT)
java 144319 kafkagod 2207u IPv4 30012894 0t0 TCP mwkafka-prod-02.tbd:9092->mwkafka-prod-02.dr:38000 (CLOSE_WAIT)

Update:

More information from thread dump https://issues.apache.org/jira/secure/attachment/12958532/threadump20190212.txt

From the threaddump attached on https://issues.apache.org/jira/browse/KAFKA-7925. I see has 'kafka-network-thread-6-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-0' locked '0x00000006ca1c9a80' and doesn't seem to be making progress. The other network threads 'kafka-network-thread-6-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-1' and and 'kafka-network-thread-6-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-2' are waiting to lock '0x00000006ca1c9a80'. This causing new connection requests to not get accepted by the kafka brokers.

Is this some kind of bug with GSSAPI?

like image 504
xabhi Avatar asked Feb 14 '19 14:02

xabhi


People also ask

Is Kafka CPU intensive?

Most Kafka deployments tend to be rather light on CPU requirements.

Why are there 3 brokers in Kafka?

In addition to @hqt answer: You can setup a Kafka HA Cluster with only 2 brokers, but the recommended replication-factor for production is 3, so you need 3 brokers in order to achieve this.

How many brokers should I have Kafka?

Even a lightly used Kafka cluster deployed for production purposes requires three to six brokers and three to five ZooKeeper nodes. The components should be spread across multiple availability zones for redundancy. Note: ZooKeeper will eventually be replaced, but its role will still have to be performed by the cluster.


1 Answers

This could be a bug in 2.1.0, fixed in 2.1.1:

https://issues.apache.org/jira/browse/KAFKA-7697

See: Too many TCP ports in CLOSE WAIT condition in kafka broker

like image 186
tgrez Avatar answered Nov 10 '22 15:11

tgrez