While reading about processing streaming elements in apache beam using Java, I came across <code>DoFn<InputT, OutputT></code> and then across <code>SimpleFunction<InputT, OutputT></code>. Both of these look similar to me and I find it difficult to understand the difference. Can someone explain the difference in layman terms?

Conceptually you can think of <code>SimpleFunction</code> is a simple case of <code>DoFn</code>: <ul> <li> <code>SimpleFunction<InputT, OutputT></code>: <ul> <li>simple input to output mapping function;</li> <li>single input produces single output;</li> <li>statically typed, you have to <code>@Override</code> the <code>apply()</code> method;</li> <li>doesn't depend on computation context;</li> <li>can't use Beam state APIs;</li> <li>example use case: <code>MapElements.via(simpleFunction)</code> to convert/modify elements one by one, producing one output for each element;</li> </ul> </li> <li> <code>DoFn<InputT, OutputT></code>: <ul> <li>executed with <code>ParDo</code>;</li> <li>exposed to the context (timestamp, window pane, etc);</li> <li>can consume side inputs;</li> <li>can produce multiple outputs or no outputs at all;</li> <li>can produce side outputs;</li> <li>can use Beam's persistent state APIs;</li> <li>dynamically typed;</li> <li>example use case: read objects from a stream, filter, accumulate them, perform aggregations, convert them, and dispatch to different outputs;</li> </ul> </li> </ul> You can find more specific examples and use cases for <code>ParDos</code> in the dev guide. This part mentions the <code>MapElements</code>, which is the use case for <code>SimpleFunctions</code>

Apache Beam: What is the difference between DoFn and SimpleFunction?

1 Answers

Conceptually you can think of SimpleFunction is a simple case of DoFn:

SimpleFunction<InputT, OutputT>:
- simple input to output mapping function;
- single input produces single output;
- statically typed, you have to @Override the apply() method;
- doesn't depend on computation context;
- can't use Beam state APIs;
- example use case: MapElements.via(simpleFunction) to convert/modify elements one by one, producing one output for each element;
DoFn<InputT, OutputT>:
- executed with ParDo;
- exposed to the context (timestamp, window pane, etc);
- can consume side inputs;
- can produce multiple outputs or no outputs at all;
- can produce side outputs;
- can use Beam's persistent state APIs;
- dynamically typed;
- example use case: read objects from a stream, filter, accumulate them, perform aggregations, convert them, and dispatch to different outputs;

You can find more specific examples and use cases for ParDos in the dev guide.

This part mentions the MapElements, which is the use case for SimpleFunctions

answered Oct 24 '22 01:10

Anton

Related questions
                            
                                Jackson (de)serialization of Java8 date/time by a JAX-RS client
                            
                                How to map a DTO to an existing JPA entity?
                            
                                Requests take too much time in Tomcat 8 on peak time
                            
                                Unable to load AWS credentials Error when accessing dynamoDB (local) with java
                            
                                How a lambda expression maps into a functional interface?
                            
                                Hibernate is not throwing LazyInitializationException in Spring Boot Project
                            
                                Spring Boot returning wrong Status Code only in Unit Test
                            
                                How to proxy a HTTP video stream to any amount of clients through a Spring Webserver
                            
                                Add margins to navigation item in BottomNavigationView
                            
                                cannot access a member of class java.nio.DirectByteBuffer (in module java.base) with modifiers "public"
                            
                                How to limit in groupBy java stream
                            
                                What is the difference between Jdeps & Jdeprscan?
                            
                                Load java trust store at runtime - after jvm have been launched?
                            
                                Why HTTP method PUT should be idempotent and not the POST in implementation RestFul service?
                            
                                How do I configure the pom.xml of Tika to stop getting all the license dependency warnings?
                            
                                Unity: Resolving Android Dependencies freeze. Cannot import any plug-in
                            
                                CORS error when connecting local React frontend to local Spring Boot middleware application
                            
                                Is it possible to get StackOverflowError without recursion?
                            
                                Failed to evaluate breakpoint condition. Reason: Object has been collected
                            
                                Get Version in Kotlin/Java code in Gradle project

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Apache Beam: What is the difference between DoFn and SimpleFunction?

Tags:

java

apache-beam

kaxil

People also ask

1 Answers

Anton

Recent Activity

Donate For Us