I have a use case in which I need to output multiple T
from a DoFn. So DoFn
function is returning a PCollection<List<T>>
. I want to convert it to PCollection<T>
so that later in pipeline I can just filter like:
PCollection<T> filteredT = filterationResult.apply(Filter.byPredicate(p -> p.equals(T) == T));
Currently the best method I can think of is, instead returning List<T>
from the ParDo
function I return KV<String,List<T>>
with same key for every item. Then in pipeline I can do below to combine result:
filterationResult.apply("Group", GroupByKey.<String, List<T>>create())
Or can I call c.output(T)
from DoFn (where c
is the ProcessContext
object passed in) multiple times?
You can call c.output(T)
from a DoFn
multiple times.
There is also a library transform Flatten.iterables()
but you don't need it in this case.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With