In Apache Flink I have a stream of tuples. Let's assume a really simple Tuple1<String>
. The tuple can have an arbitrary value in it's value field (e.g. 'P1', 'P2', etc.). The set of possible values is finite but I don't know the full set beforehand (so there could be a 'P362'). I want to write that tuple to a certain output location depending on the value inside of the tuple. So e.g. I would like to have the following file structure:
/output/P1
/output/P2
In the documentation I only found possibilities to write to locations that I know beforehand (e.g. stream.writeCsv("/output/somewhere")
), but no way of letting the contents of the data decide where the data is actually ending up.
I read about output splitting in the documentation but this doesn't seem to provide a way to redirect the output to different destinations the way I would like to have it (or I just don't understand how this would work).
Can this be done with the Flink API, if so, how? If not, is there maybe a third party library that can do it or would I have to build such a thing on my own?
Side outputs are defined within an operator (typically a ProcessFunction or window operator) that apply arbitrary logic and feature multiple outputs.
An OutputTag is a typed and named tag to use for tagging side outputs of an operator. An OutputTag must always be an anonymous inner class so that Flink can derive a TypeInformation for the generic type parameter.
You can implement a custom sink. Inherit from one of both:
org.apache.flink.streaming.api.functions.sink.SinkFunction
org.apache.flink.streaming.api.functions.sink.RichSinkFunction
In your program use:
stream.addSink(SinkFunction<T> sinkFunction);
instead of stream.writeCsv("/output/somewhere")
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With