I have the task of translating a list of dependent nodes into AWS Step Functions. The AWS Step Function definition allows for parallel branches or even branches nested to multiple levels deep. Unfortunately it does not support dependencies between tasks in the branches and therefore forces you to complete the parallel step before both results are available to subsequent tasks in the step function.
In my diagram delow a simple parallel branch like shown in Graph 1 is easily supported by Step Functions.
When it comes to Graph 2 and especially Graph 3 it becomes a problem.
As a simple approach we could introduce additional nodes to collect the results together for their dependent nodes as demonstrated in Graphs 2b and 3b but this now introduces dependencies that didn't exist before:
This is a problem because in the case of manual approval tasks the time for these tasks could be in the order of hours to days. This would cause later steps to be delayed unnecessarily by tasks that they do not have dependencies on.
Any suggestions on how to solve this? Maybe I could take a different approach? Maybe there is some fancy graph theory algorithm I can apply? I don't even know what words to use to explain this problem in graph theory.
Here is a url for playing with these graphs on draw.io if you need to.
For more information, see Fallback States. A Parallel state causes AWS Step Functions to execute each branch, starting with the state named in that branch's StartAt field, as concurrently as possible, and wait until all branches terminate (reach a terminal state) before processing the Parallel state's Next field.
Large data sets that need to be processed concurrently sounds like a great use for Step Functions. However, if you are doing parallel processing via a Map state, the max concurrency limit is 40. Meaning you will be processing the data in “batches” of 40.
AWS Step Functions is a visual workflow service that helps developers use AWS services to build distributed applications, automate processes, orchestrate microservices, and create data and machine learning (ML) pipelines.
Fundamentally what you asked is a DAG while the Amazon States Language is state machine based. So I don't think there is a solution for your problem.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With