Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert PCollection<List<String>> to PCollection<String> in dataflow/beam

I have a use case in which I need to output multiple T from a DoFn. So DoFn function is returning a PCollection<List<T>>. I want to convert it to PCollection<T> so that later in pipeline I can just filter like:

PCollection<T> filteredT = filterationResult.apply(Filter.byPredicate(p -> p.equals(T) == T));

Currently the best method I can think of is, instead returning List<T> from the ParDo function I return KV<String,List<T>> with same key for every item. Then in pipeline I can do below to combine result:

filterationResult.apply("Group", GroupByKey.<String, List<T>>create())

Or can I call c.output(T) from DoFn (where c is the ProcessContext object passed in) multiple times?

like image 402
PUG Avatar asked Feb 04 '23 07:02

PUG


1 Answers

You can call c.output(T) from a DoFn multiple times.

There is also a library transform Flatten.iterables() but you don't need it in this case.

like image 137
Kenn Knowles Avatar answered Mar 07 '23 19:03

Kenn Knowles