Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apache Beam : FlatMap vs Map?

I want to understand in which scenario that I should use FlatMap or Map. The documentation did not seem clear to me.

I still do not understand in which scenario I should use the transformation of FlatMap or Map.

Could someone give me an example so I can understand their difference?

I understand the difference of FlatMap vs Map in Spark, and however not sure if there any similarity?

like image 525
Emma Y Avatar asked Aug 14 '17 09:08

Emma Y


People also ask

What is beam flatMap?

Pydoc. Applies a simple 1-to-many mapping function over each element in the collection. The many elements are flattened into the resulting collection.

What is difference between MAP and flatMap in Scala?

In Scala, flatMap() method is identical to the map() method, but the only difference is that in flatMap the inner grouping of an item is removed and a sequence is generated. It can be defined as a blend of map method and flatten method.

What is a PCollection?

PCollection : A PCollection represents a distributed data set that your Beam pipeline operates on. The data set can be bounded, meaning it comes from a fixed source like a file, or unbounded, meaning it comes from a continuously updating source via a subscription or other mechanism.

What is beam DoFn?

DoFn is a Beam SDK class that describes a distributed processing function.


1 Answers

These transforms in Beam are exactly same as Spark (Scala too).

A Map transform, maps from a PCollection of N elements into another PCollection of N elements.

A FlatMap transform maps a PCollections of N elements into N collections of zero or more elements, which are then flattened into a single PCollection.

As a simple example, the following happens:

beam.Create([1, 2, 3]) | beam.Map(lambda x: [x, 'any']) # The result is a collection of THREE lists: [[1, 'any'], [2, 'any'], [3, 'any']] 

Whereas:

beam.Create([1, 2, 3]) | beam.FlatMap(lambda x: [x, 'any']) # The lists that are output by the lambda, are then flattened into a # collection of SIX single elements: [1, 'any', 2, 'any', 3, 'any'] 
like image 126
Pablo Avatar answered Oct 20 '22 12:10

Pablo