Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to check the actual number of incremental fetch session cache slots used in Kafka cluster?

Tags:

apache-kafka

I am reading this question Kafka: Continuously getting FETCH_SESSION_ID_NOT_FOUND, and I am trying to apply the solution suggested by Hrishikesh Mishra, as we also face the similar issue, so I increased the broker setting max.incremental.fetch.session.cache.slots to 2000, default was 1000. But now I wonder how can I monitor the actual number of used incremental fetch session cache slots, in prometheus I see kafka_server_fetchsessioncache_numincrementalfetchpartitionscached metrics, and promql query shows on each of three brokers the number that is now significantly over 2000, that is 2703, 2655 and 2054, so I am confused if I look at the proper metrics. There is also kafka_server_fetchsessioncache_incrementalfetchsessionevictions_total that shows zeros on all brokers.

OK, there is also kafka_server_fetchsessioncache_numincrementalfetchsessions that shows cca 500 on each of three brokers, so that is total of cca 1500, which is between 1000 and 2000, so maybe that metrics is the one that is controlled by max.incremental.fetch.session.cache.slots ?

Actually, as of now, it is already more than 700 incremental fetch sessions on each broker, that is total of more than 2100, so, obviously, the limit of 2000 applies to each broker, so that the number in the whole cluster can go as far as 6000. The reason why the number is now below 1000 on each broker is because the brokers were restarted after the configuration change.

And the question is how can this allocation be checked on the individual consumer level. Such a query:

count by (__name__) ({__name__=~".*fetchsession.*"})

returns only this table:

Element                                                             Value
kafka_server_fetchsessioncache_incrementalfetchsessionevictions_total{} 3
kafka_server_fetchsessioncache_numincrementalfetchpartitionscached{}    3
kafka_server_fetchsessioncache_numincrementalfetchsessions{}            3
like image 602
hdjur_jcv Avatar asked Mar 02 '23 23:03

hdjur_jcv


1 Answers

The metric named kafka.server:type=FetchSessionCache,name=NumIncrementalFetchSessions is the correct way to monitor the number of FetchSessions.

The size is configurable via max.incremental.fetch.session.cache.slots. Note that this setting is applied per-broker, so each broker can cache up to max.incremental.fetch.session.cache.slots sessions.

The other metric you saw, kafka.server:type=FetchSessionCache,name=NumIncrementalFetchPartitionsCached, is the total number of partitions used across all FetchSession. Many FetchSessions will used several partitions so it's expected to see a larger number of them.

As you said, the low number of FetchSessions you saw was likely due to the restart.

like image 79
Mickael Maison Avatar answered Mar 05 '23 11:03

Mickael Maison