I'm learning about step function and specifically, I'm trying to figure out how to trigger state machine execution with S3 event. I was reading this post: https://docs.aws.amazon.com/step-functions/latest/dg/tutorial-cloudwatch-events-s3.html. This documentation does give a rough guide of how to configure stuff but I'm still unclear about the following questions:
What does the state machine input mean? So this documentation doesn't explain too much about what does each field in the input mean. Does anyone know where to find the documentation for it? For example, what's the id field in this input?
How does a Java lambda retrieve useful information from the input? I saw documentation about how to manipulate input that was predefined in the state machine schema (cloudformation or Amazon Statemachine Lamguage) but not for the input that was auto-generated by a s3 event.
Does anyone ever built similar functionality before using state machine + s3 event? Any thoughts would be appreciated.
You can execute an AWS Step Functions state machine in response to an Amazon EventBridge or on a schedule.
AWS Step Functions Alternatives Schedule AWS Lambda functions: You can run simple workflows (consisting mainly of one Lambda function) by incorporating the workflow logic into a Lambda function. You can then trigger the function by using an AWS Lambda schedule event.
AWS Step Functions is useful for any engineering teams who need to build workflows across multiple Amazon services. Use cases for Step Functions vary widely, from orchestrating serverless microservices, to building data-processing pipelines, to defining a security-incident response.
In principle yes. However, Lambda has a 99.9% SLA and S3 has a 99.9% uptime SLA as well. So in theory some events could be missed, but only when they have a service disruption. When the Lambda function fails, it automatically retries up to three times.
We had a similar task - start StepFunctions state machine by an S3 event - with a small modification. We wanted to start different state machines based on the extension of the uploaded file.
Initially we have followed the same tutorial that you're referring to. CloudTrail rule with StepFunctions state machine as target.
But later we have realized that we can't really filter S3 events by file extensions (at least we could not find the way).
Finally we've managed to solve it in a different way:
This is a bit more complex compared to the CloudTrail solution as we deploy an additional lambda function. But we can filter S3 events as we need and we have also full control of what is being fed to the state machine. So I think this solution is more flexible than the CloudTrail solution.
I will now share some details of our solution. I'll have to severely cut our code, so no guarantees that this will work OOTB, but, hopefully, it should be enough for you to get the idea.
Bucket for uploads
UploadsInboundBucket:
Type: 'AWS::S3::Bucket'
Properties:
BucketName: !Sub >-
${AWS::AccountId}-uploads-inbound-bucket
NotificationConfiguration:
LambdaConfigurations:
- Function: !GetAtt StartVideoclipStateMachineExecutionFunction.Arn
Event: 's3:ObjectCreated:*'
Filter:
S3Key:
Rules:
- Name: suffix
Value: mp4
- Function: !GetAtt StartVideoStateMachineExecutionFunction.Arn
Event: 's3:ObjectCreated:*'
Filter:
S3Key:
Rules:
- Name: suffix
Value: json
s3:ObjectCreated:*
triggers (depending on the suffix of the object key) one of two lambda functions StartVideoclipStateMachineExecutionFunction
or StartVideoStateMachineExecutionFunction
.
The S3 event which is fed to the lambda function is described here in great detail: https://docs.aws.amazon.com/AmazonS3/latest/dev/notification-content-structure.html
Lambda function simply parses the input, builds state machine input and starts the state machine.
var stepfunction = require('./stepfunction');
var aws = require('aws-sdk');
var parser = require('./parser');
exports.handler = (event, context, callback) => {
var statemachineArn = process.env.statemachine_arn;
var stepfunctions = new aws.StepFunctions();
stepfunction.startExecutionFromS3Event(stepfunctions, parser, statemachineArn , event);
callback(null, event);
};
Parse the S3 event:
module.exports = {
parseEvent : function(event)
{
return event.Records[0].s3.bucket.arn + '/'+ event.Records[0].s3.object.key;
}
};
Start state machine execution:
module.exports = {
startExecutionFromS3Event : function(stepfunctions, parser, statemachineArn , event)
{
//get affected S3 object from Event
var arn = parser.parseEvent(event);
//Create input for Step
var input = {
"detail" : {
"resources" : [
{
"type": "AWS::S3::Object",
"ARN": arn
}
]
}
};
//start step function execution
var params = {
stateMachineArn: statemachineArn,
input: JSON.stringify(input)
};
stepfunctions.startExecution(params, function (err, data) {
if (err) {
console.log('err while executing step function')
console.log(JSON.stringify(err));
} else {
console.log('started execution of step function')
}
});
}
}
You also need a ton of IAM roles an permissions to make all of this work (like, lambda function must be allowed to start the state machine), but I'll omit it at this point.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With