
I have a generic input request which contains input that needs to be transformed and saved. If the resulting output needs to be transformed, i would implement a new processor (transformer) for it.
class Request {
Input input;
Transformer transformer;
Output output;
}
Based on the message read from the source i generate a graph of required transforms that's depicted in the image, its based on the configuration for each input lets say for different customers. Here the pipeline i generate for every message is dynamic. i.e
Graph<Request> gRequests;
Child transforms input is dependent on successful completion of parent transform.
Is there any way i can generate an dynamic apache beam pipeline for this? or whats the best approach?
As of now i am flattening every transform into a sequential list of transforms based on input and output and passing through every pipeline step, it performs the transform if its configured or else just skips.
Sequential list of transformers.

For every new transform needed it rearranges the pipeline. Thats the expected flow!!. But if i can chain transforms based on input and my customer configuration It would be a whole lot easier.
You can't generate Beam pipeline in the runtime - it should be known in advance since before actual run, the runner will process a DAG of your pipeline and translate it into a pipeline of your data processing system (e.g. Spark, Flink, Dataflow, etc).
In the same time, you can leverage branches in your pipeline that incorporate different transforms and process different records with specific branches depending on your conditions.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With