How to log uncaught exceptions during Flink job execution

Question

I am trying to attach Sentry to our Flink cluster to track job execution. Sentry acts as a logger which captures messages and sends them to a central server. By default it captures all messages with level WARN or higher.

To get Sentry to catch all problems, I need to write a WARN or ERROR log message whenever an operator raises an uncaught exception. If the restart strategy fails, the execute() method in the Execution Environment will throw the final exception, which I can log appropriately. But I have yet not found a way to log exceptions which cause the job to restart. Flink logs them as INFO messages, but that makes them difficult to filter from the rest.

What is the appropriate way to handle uncaught exceptions in Flink jobs?

Till Rohrmann · Accepted Answer

From Flink's perspective, user code errors are expected and, hence, Flink does not log them on WARN or ERROR. WARN and ERROR are reserved for logging statements which indicate that something is wrong with Flink itself.

The best option to capture task failures is to grep for <TASK_NAME> switched from RUNNING to FAILED. That way you will be notified whenever <TASK_NAME> failed. Note, however, that it is not guaranteed that the logging statement will never change.

How to log uncaught exceptions during Flink job execution

Tags:

logging

apache-flink

sentry

Tzanko Matev

1 Answers

Till Rohrmann

Recent Activity

Donate For Us

How to log uncaught exceptions during Flink job execution

Tags:

logging

apache-flink

sentry

Tzanko Matev

1 Answers

Till Rohrmann

Related questions

Recent Activity

Donate For Us