I am currently wondering how to handle application errors in Apache Flink streaming applications. In general, I see two cases:
For the first case, it looks like the common solution is to just throw some exception. Or is there a better way, e.g. a special kind of exception for more efficient handling such as FailedException
from Apache Storm Trident (see Error handling in Storm Trident topologies).
For permanent errors, I couldn't find any information online. A map()
operation, for example, always has to return something so one cannot just silently drop messages as you would in Trident.
What are the available APIs or best practices? Thanks for your help.
Since this question was asked, there has been some development:
This discussion holds the background of why side outputs should help, key extract:
Side outputs(a.k.a Multi-outputs) is one of highly requested features in high fidelity stream processing use cases. With this feature, Flink can
- Side output corrupted input data and avoid job fall into “fail -> restart -> fail” cycle
- Side output sparsely received late arriving events while issuing aggressive watermarks in window computation.
This resulted in jira: FLINK-4460 which has been resolved in Flink 1.1.3 and above.
I hope this helps, if an even more generic solution would be desireable, please think a bit on your usecase and consider to create a jira for it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With