Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to simplify complex parallel branch interdependencies for Step Functions

I have the task of translating a list of dependent nodes into AWS Step Functions. The AWS Step Function definition allows for parallel branches or even branches nested to multiple levels deep. Unfortunately it does not support dependencies between tasks in the branches and therefore forces you to complete the parallel step before both results are available to subsequent tasks in the step function.

In my diagram delow a simple parallel branch like shown in Graph 1 is easily supported by Step Functions.

enter image description here

When it comes to Graph 2 and especially Graph 3 it becomes a problem.

enter image description here

As a simple approach we could introduce additional nodes to collect the results together for their dependent nodes as demonstrated in Graphs 2b and 3b but this now introduces dependencies that didn't exist before:

  • In Graph 2b these new dependencies have been introduced:
    • E -> A, F -> A
  • In Graph 3b these new dependencies have been introduced:
    • E -> A, F -> A, F-> B, C -> E

enter image description here

This is a problem because in the case of manual approval tasks the time for these tasks could be in the order of hours to days. This would cause later steps to be delayed unnecessarily by tasks that they do not have dependencies on.

Any suggestions on how to solve this? Maybe I could take a different approach? Maybe there is some fancy graph theory algorithm I can apply? I don't even know what words to use to explain this problem in graph theory.

Here is a url for playing with these graphs on draw.io if you need to.

like image 874
Inquisitive Mind Avatar asked Jul 26 '19 13:07

Inquisitive Mind


People also ask

How do you use step function in parallel?

For more information, see Fallback States. A Parallel state causes AWS Step Functions to execute each branch, starting with the state named in that branch's StartAt field, as concurrently as possible, and wait until all branches terminate (reach a terminal state) before processing the Parallel state's Next field.

How many step functions can run concurrently?

Large data sets that need to be processed concurrently sounds like a great use for Step Functions. However, if you are doing parallel processing via a Map state, the max concurrency limit is 40. Meaning you will be processing the data in “batches” of 40.

What are AWS step functions?

AWS Step Functions is a visual workflow service that helps developers use AWS services to build distributed applications, automate processes, orchestrate microservices, and create data and machine learning (ML) pipelines.


1 Answers

Fundamentally what you asked is a DAG while the Amazon States Language is state machine based. So I don't think there is a solution for your problem.

like image 118
wanghq Avatar answered Sep 23 '22 22:09

wanghq