I am trying to use AWS Step Functions to trigger operations on a very large S3 file via Lambda. To do this I am invoking a step function with an input that has the S3 key of the file, and byte ranges for that file (each parallel iteration would operate on a different section of the file). The input looks something like
{
"job-spec": {
"file": "some_s3_key",
"array": [
"0-100",
"101-200",
"201-300", ...
]
}
}
My Step function is very simple, takes that input and maps it out, however I can't seem to get both the file and the array as input to my lambda. Here is my step function definition
{
"Comment": "An example of the Amazon States Language using a map state to process elements of an array with a max concurrency of 2.",
"StartAt": "Map",
"States": {
"Map": {
"Type": "Map",
"ItemsPath": "$.job-spec",
"ResultPath": "$.array",
"MaxConcurrency": 2,
"Next": "Final State",
"Iterator": {
"StartAt": "My Stage",
"States": {
"My Stage": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"FunctionName": "arn:aws:lambda:us-east-1:<>:function:some-lambda:$LATEST",
"Payload": {
"Input.$": "$.array"
}
},
"End": true
}
}
}
},
"Final State": {
"Type": "Pass",
"End": true
}
}
}
As written above it complains that that job-spec
is not an array for the ItemsPath
. If I change that to $.job-spec.array
I get the array
I'm looking for in my lambda but the key
is missing. I tried joining the two together with a |
but I hit a limit for how much data I can pass around in Step Functions
Essentially I want each python lambda to get the file key, and one entry from the array
It looks like the Parameters
value can be used for this but I can't quite get the syntax right
Step Functions paths use JsonPath syntax. To specify that a parameter use a path to reference a JSON node in the input, end the parameter name with . $ . For example, if you have text in your state input in a node named message , you could pass that to a parameter by referencing the input JSON with a path.
The Map state ( "Type": "Map" ) can be used to run a set of steps for each element of an input array. While the Parallel state executes multiple branches of steps using the same input, a Map state will execute the same steps for multiple entries of an array in the state input.
The ResultSelector field lets you create a collection of key value pairs, where the values are static or selected from the state's result. Using the ResultSelector field, you can choose what parts of a state's result you want to pass to the ResultPath field.
Step Functions is based on state machines and tasks. A state machine is a workflow. A task is a state in a workflow that represents a single unit of work that another AWS service performs. Each step in a workflow is a state.
Was able to finally get the syntax right.
"ItemsPath": "$.job-spec.array",
"Parameters": {
"byte_array.$": "$$.Map.Item.Value",
"file.$": "$.job-spec.file"
},
It seems that Parameters
can be used to create custom inputs for each stage. The $$
is accessing the context of the stage and not the actual input. It appears that ItemsPath
takes the array and puts it into a context which can be used later.
UPDATE Here is some AWS Documentation showing this being used from the comments below
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With