Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a way to determine where messages came from in a Kafka topic?

Tags:

apache-kafka

There is large amounts of data being pushed into one of our Kafka topics, is there a way to determine which producer this data is coming from?

like image 231
AnonymousAlias Avatar asked Dec 22 '22 19:12

AnonymousAlias


2 Answers

Without SASL or Authorizer level auditing, no there is not an easy way other than tracking down connected, suspicious client-id via JMX.

I would suggest you enforce a standard message format and spread the word to producer teams. For example, look at the Cloudevents spec, which includes a source field

https://github.com/cloudevents/spec/blob/master/kafka-protocol-binding.md

like image 75
OneCricketeer Avatar answered May 07 '23 23:05

OneCricketeer


You can enable quotas for the clients/users, and then monitor which clients get throttled via two quota-related JMX MBeans - bandwidth and request rate:

Metric: Bandwidth quota metrics per (user, client-id), user or client-id
MBean: kafka.server:type={Produce|Fetch},user=([-.\w]+),client-id=([-.\w]+)
What it shows:: Two attributes. throttle-time indicates the amount of time in ms the client was throttled. Ideally = 0. byte-rate indicates the data produce/consume rate of the client in bytes/sec. For (user, client-id) quotas, both user and client-id are specified. If per-client-id quota is applied to the client, user is not specified. If per-user quota is applied, client-id is not specified.

Metric: Request quota metrics per (user, client-id), user or client-id
MBean: kafka.server:type=Request,user=([-.\w]+),client-id=([-.\w]+)
What it shows: Two attributes. throttle-time indicates the amount of time in ms the client was throttled. Ideally = 0. request-time indicates the percentage of time spent in broker network and I/O threads to process requests from client group. For (user, client-id) quotas, both user and client-id are specified. If per-client-id quota is applied to the client, user is not specified. If per-user quota is applied, client-id is not specified.

like image 29
mazaneicha Avatar answered May 08 '23 01:05

mazaneicha