Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dynamic pipelines in apache beam

Tags:

apache-beam

enter image description here

I have a generic input request which contains input that needs to be transformed and saved. If the resulting output needs to be transformed, i would implement a new processor (transformer) for it.

class Request {
  Input input;
  Transformer transformer;
  Output output;
}

Based on the message read from the source i generate a graph of required transforms that's depicted in the image, its based on the configuration for each input lets say for different customers. Here the pipeline i generate for every message is dynamic. i.e

Graph<Request> gRequests; 

Child transforms input is dependent on successful completion of parent transform.

Is there any way i can generate an dynamic apache beam pipeline for this? or whats the best approach?

As of now i am flattening every transform into a sequential list of transforms based on input and output and passing through every pipeline step, it performs the transform if its configured or else just skips.

Sequential list of transformers.

enter image description here

For every new transform needed it rearranges the pipeline. Thats the expected flow!!. But if i can chain transforms based on input and my customer configuration It would be a whole lot easier.

like image 573
Raghu Dinka Vijaykumar Avatar asked Nov 18 '25 18:11

Raghu Dinka Vijaykumar


1 Answers

You can't generate Beam pipeline in the runtime - it should be known in advance since before actual run, the runner will process a DAG of your pipeline and translate it into a pipeline of your data processing system (e.g. Spark, Flink, Dataflow, etc).

In the same time, you can leverage branches in your pipeline that incorporate different transforms and process different records with specific branches depending on your conditions.

like image 150
Alexey Romanenko Avatar answered Nov 22 '25 03:11

Alexey Romanenko