Are there any alerting options for scenarios where a Kafka Connect Connector or a Connector task fails or experiences errors?
We have Kafka Connect running, it runs well, but we've had errors that need to be manually traced and discovered. And often, it's been in an error state for a week before a human notices a problem.
(I still can't comment so to respond to clay's answer...)
NOTE: There is a bug in the JMX metrics for task/connector status (at time of posting: 5/11/2020)
1) When a task fails, it's status metrics dissapear. This is a known issue and there is a fix in progress. A Jira can be found here and PR can be found here.
2) Don't use the Connector metric to monitor the status of the tasks. The Connector can show up as running fine but the tasks can be in a failure state, you need to monitor the tasks directly. This is mentioned in Confluent's Connector monitoring tips where it says:
In most cases, connector and task states will match, though they may be different for short periods of time when changes are occurring or if tasks have failed. For example, when a connector is first started, there may be a noticeable delay before the connector and its tasks have all transitioned to the RUNNING state. States will also diverge when tasks fail since Connect does not automatically restart failed tasks.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With