Both <code>DoFn</code> and <code>PTransform</code> is a means to define operation for <code>PCollection</code>. How do we know which to use when?

A simple way to understand it is by analogy with <code>map(f)</code> for lists: <ul> <li>The higher-order function <code>map</code> applies a function to each element of a list, returning a new list of the results. You might call it a computational pattern.</li> <li>The function <code>f</code> is the logic applied to each element.</li> </ul> Now, switching to talk about Beam specifics, I think you are asking about <code>ParDo.of(fn)</code>, which is a <code>PTransform</code>. <ul> <li>A <code>PTransform</code> is an operation that takes <code>PCollections</code> as input and yields <code>PCollections</code> as output. Beam has just five primitive types of <code>PTransform</code>, encapsulating embarrassingly parallel computational patterns.</li> <li> <code>ParDo</code> is the computational pattern of per-element computation. It has some variations, but you don't need to worry about that for this question.</li> <li>The <code>DoFn</code>, here I called it <code>fn</code>, is the logic that is applied to each element.</li> </ul> It may also help to think of the fact that you write a <code>DoFn</code> to say what to do on each element, and the Beam runner provides the <code>ParDo</code> to apply your logic.

Apache Beam: DoFn vs PTransform

1 Answers

A simple way to understand it is by analogy with map(f) for lists:

The higher-order function map applies a function to each element of a list, returning a new list of the results. You might call it a computational pattern.
The function f is the logic applied to each element.

Now, switching to talk about Beam specifics, I think you are asking about ParDo.of(fn), which is a PTransform.

A PTransform is an operation that takes PCollections as input and yields PCollections as output. Beam has just five primitive types of PTransform, encapsulating embarrassingly parallel computational patterns.
ParDo is the computational pattern of per-element computation. It has some variations, but you don't need to worry about that for this question.
The DoFn, here I called it fn, is the logic that is applied to each element.

It may also help to think of the fact that you write a DoFn to say what to do on each element, and the Beam runner provides the ParDo to apply your logic.

148

answered Oct 23 '22 11:10

Kenn Knowles

Related questions
                            
                                Setting credentials for https git clone in AWS CodeBuild
                            
                                Apollo duplicates first result to every node in array of edges
                            
                                Angular5 httpClient get:Cannot read property 'toLowerCase' of undefined
                            
                                Can Java 10 type inference for local variables infer void?
                            
                                How to return 404 page intentionally in django
                            
                                Multiple-output Gaussian Process regression in scikit-learn
                            
                                Amazon S3 Bucket Encryptions - KMS vs AES256
                            
                                How to match cv2.imread to the keras image.img_load output
                            
                                Difference requiresMainQueueSetup and dispatch_get_main_queue?
                            
                                AWS, Credential must have exactly 5 slash-delimited elements,
                            
                                Is it possible to generate multiple Angular components using single angular-cli command like ng generate component comp1, comp2, comp3?
                            
                                Install RT Linux patch for Ubuntu

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Apache Beam: DoFn vs PTransform

Tags:

apache-beam

google-cloud-dataflow

user_1357

People also ask

1 Answers

Kenn Knowles

Recent Activity

Donate For Us