Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ParDo vs FlatMap in Apache Beam?

Is there a difference between ParDo and FlatMap in Dataflow / Apache Beam?

I think both apply a function to each element of the incoming PCollection, and return the iterable; but I imagine there must be some difference?

like image 568
Maximilian Avatar asked Apr 27 '17 02:04

Maximilian


1 Answers

FlatMap is a simpler operation built as you might expect from ParDo. If this fits your needs, it is a good choice.

ParDo is a lower-level building block of element-wise computation that has additional capabilities like side inputs, multiple output collections, access to the current window, some really low level callbacks for starting and committing bundle of elements, and more.

In practice, many uses of FlatMap and ParDo end up with a similar code bulk, but in my opinion it is most readable to use the simplest (highest level) transform available.

like image 91
Kenn Knowles Avatar answered Sep 29 '22 08:09

Kenn Knowles