I have a use case in which I need to output multiple T from a DoFn. So DoFn function is returning a PCollection<List<T>>. I want to convert it to PCollection<T> so that later in pipeline I can just filter like:
PCollection<T> filteredT = filterationResult.apply(Filter.byPredicate(p -> p.equals(T) == T));
Currently the best method I can think of is, instead returning List<T> from the ParDo function I return KV<String,List<T>> with same key for every item. Then in pipeline I can do below to combine result:
filterationResult.apply("Group", GroupByKey.<String, List<T>>create())
Or can I call c.output(T) from DoFn (where c is the ProcessContext object passed in) multiple times?
You can call c.output(T) from a DoFn multiple times.
There is also a library transform Flatten.iterables() but you don't need it in this case.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With