Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Anyone has experience with triggering step function with S3 event?

I'm learning about step function and specifically, I'm trying to figure out how to trigger state machine execution with S3 event. I was reading this post: https://docs.aws.amazon.com/step-functions/latest/dg/tutorial-cloudwatch-events-s3.html. This documentation does give a rough guide of how to configure stuff but I'm still unclear about the following questions:

  1. What does the state machine input mean? So this documentation doesn't explain too much about what does each field in the input mean. Does anyone know where to find the documentation for it? For example, what's the id field in this input?

  2. How does a Java lambda retrieve useful information from the input? I saw documentation about how to manipulate input that was predefined in the state machine schema (cloudformation or Amazon Statemachine Lamguage) but not for the input that was auto-generated by a s3 event.

Does anyone ever built similar functionality before using state machine + s3 event? Any thoughts would be appreciated.

like image 853
OD Street Avatar asked Jul 17 '19 23:07

OD Street


People also ask

Can a step function be triggered by S3?

You can execute an AWS Step Functions state machine in response to an Amazon EventBridge or on a schedule.

How are AWS Step Functions triggered?

AWS Step Functions Alternatives Schedule AWS Lambda functions: You can run simple workflows (consisting mainly of one Lambda function) by incorporating the workflow logic into a Lambda function. You can then trigger the function by using an AWS Lambda schedule event.

Are AWS Step Functions worth it?

AWS Step Functions is useful for any engineering teams who need to build workflows across multiple Amazon services. Use cases for Step Functions vary widely, from orchestrating serverless microservices, to building data-processing pipelines, to defining a security-incident response.

Are S3 triggers reliable?

In principle yes. However, Lambda has a 99.9% SLA and S3 has a 99.9% uptime SLA as well. So in theory some events could be missed, but only when they have a service disruption. When the Lambda function fails, it automatically retries up to three times.


1 Answers

We had a similar task - start StepFunctions state machine by an S3 event - with a small modification. We wanted to start different state machines based on the extension of the uploaded file.

Initially we have followed the same tutorial that you're referring to. CloudTrail rule with StepFunctions state machine as target.

But later we have realized that we can't really filter S3 events by file extensions (at least we could not find the way).

Finally we've managed to solve it in a different way:

  • S3 bucket is configured with notifications which trigger certain lambda functions for specific S3 object key suffixes (file extension if you will)
  • Lambda function get S3 event as input, transform it as required and start StepFunctions step machine with the transformed input.
  • StepFunctions state machine is started with the input that lambda function has created and executes as usual

This is a bit more complex compared to the CloudTrail solution as we deploy an additional lambda function. But we can filter S3 events as we need and we have also full control of what is being fed to the state machine. So I think this solution is more flexible than the CloudTrail solution.

I will now share some details of our solution. I'll have to severely cut our code, so no guarantees that this will work OOTB, but, hopefully, it should be enough for you to get the idea.


Bucket for uploads

  UploadsInboundBucket:
    Type: 'AWS::S3::Bucket'
    Properties:
      BucketName: !Sub >-
        ${AWS::AccountId}-uploads-inbound-bucket
      NotificationConfiguration:
        LambdaConfigurations:
          - Function: !GetAtt StartVideoclipStateMachineExecutionFunction.Arn
            Event: 's3:ObjectCreated:*'
            Filter:
              S3Key:
                Rules:
                  - Name: suffix
                    Value: mp4
          - Function: !GetAtt StartVideoStateMachineExecutionFunction.Arn
            Event: 's3:ObjectCreated:*'
            Filter:
              S3Key:
                Rules:
                  - Name: suffix
                    Value: json

s3:ObjectCreated:* triggers (depending on the suffix of the object key) one of two lambda functions StartVideoclipStateMachineExecutionFunction or StartVideoStateMachineExecutionFunction.

The S3 event which is fed to the lambda function is described here in great detail: https://docs.aws.amazon.com/AmazonS3/latest/dev/notification-content-structure.html

Lambda function simply parses the input, builds state machine input and starts the state machine.

var stepfunction = require('./stepfunction');
var aws = require('aws-sdk');
var parser = require('./parser');

exports.handler = (event, context, callback) => { 

  var statemachineArn = process.env.statemachine_arn;
  var stepfunctions = new aws.StepFunctions();
  
  stepfunction.startExecutionFromS3Event(stepfunctions, parser, statemachineArn , event);

  callback(null, event);
    
};

Parse the S3 event:

module.exports = {
  parseEvent : function(event)
  {
    return event.Records[0].s3.bucket.arn + '/'+  event.Records[0].s3.object.key;
  }
};

Start state machine execution:

module.exports = {
    startExecutionFromS3Event : function(stepfunctions, parser, statemachineArn , event)
    {
        //get affected S3 object from Event
        var arn = parser.parseEvent(event);

        //Create input for Step
        var input = {
            "detail" : {
                "resources" : [
                    {
                        "type": "AWS::S3::Object",
                        "ARN": arn
                    }
                ]
            }
        };
    
        //start step function execution
        var params = {
            stateMachineArn: statemachineArn,
            input: JSON.stringify(input)
        };


        stepfunctions.startExecution(params, function (err, data) {
            if (err) {
                console.log('err while executing step function')
                console.log(JSON.stringify(err));
            } else {
                console.log('started execution of step function')
            }
        });
    
    }
}

You also need a ton of IAM roles an permissions to make all of this work (like, lambda function must be allowed to start the state machine), but I'll omit it at this point.

like image 51
lexicore Avatar answered Sep 20 '22 14:09

lexicore