I have implemented MapFunction
for my Apache Flink flow. It is parsing incoming elements and convert them to other format but sometimes error can appear (i.e. incoming data is not valid).
I see two possible ways how to handle it:
So, I have two questions:
MapFunction
?You could use a FlatMapFunction
instead of a MapFunction
. This would allow you to only emit an element if it is valid. The following shows an example implementation:
input.flatMap(new FlatMapFunction<String, Long>() {
@Override
public void flatMap(String input, Collector<Long> collector) throws Exception {
try {
Long value = Long.parseLong(input);
collector.collect(value);
} catch (NumberFormatException e) {
// ignore invalid data
}
}
});
This is to build on @Till Rohrmann's idea above. Adding this as an answer instead of a comment for better formatting.
I think one way to implement "split + select" could be to use a ProcessFunction with a SideOutput. My graph would look something like this:
Source --> ValidateProcessFunction ---good data--> UDF--->SinkToOutput
\
\---bad data----->SinkToErrorChannel
Would this work? Is there a better way?
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With