Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to return output from AWS Glue jobs back to the calling Step Function workflow?

AWS Step Functions allow calling AWS Glue jobs, as described here: https://docs.aws.amazon.com/step-functions/latest/dg/connect-glue.html

I want to run the job and (after saving the results to S3) return some metadata produced during the job (like row count or number of filtered rows) back to the Step function flow.

We can pass parameters from Step functions to the Glue job like this:

              "RunGlueJob": {
                "Type": "Task",
                "Resource": "arn:aws:states:::glue:startJobRun.sync",
                "Parameters": {
                  "JobName": "MyJobName",
                  "Arguments": {
                    "--param1.$": "$.param1",
                    "--param2.$": "$.param2"
                  }
                },
                "Next": "NextState"
              },

But how can the Glue job return output back to the Step Function workflow? I tried just returning a String from the main() function inside the (Scala) Glue job, but it doesn't show up in JSON returned to the step function flow:

{
      "AllocatedCapacity": 3,
      "Arguments": {
        "--param1.$": "$.param1",
        "--param2.$": "$.param2"
      },
      "Attempt": 0,
      "CompletedOn": 1570114802442,
      "ExecutionTime": 39,
      "GlueVersion": "0.9",
      "Id": "jr_some_id",
      "JobName": "MyJobName",
      "JobRunState": "SUCCEEDED",
      "LastModifiedOn": 1570114802442,
      "LogGroupName": "/aws-glue/jobs",
      "MaxCapacity": 3,
      "PredecessorRuns": [],
      "StartedOn": 1570114746138,
      "Timeout": 2880
    }

I cannot find any documentation on this, so it might be that this is simply not possible. However, returning values from Lambdas works just fine and shows up normally inside the Step function workflow.

like image 778
Turiphro Avatar asked Oct 15 '22 10:10

Turiphro


1 Answers

You can't return anything from glue job at this stage. By definition, AWS glue is expected to work on huge amount of data and hence it is expected that output will also be huge amount of data.

You may write result to dynamodb or s3 or any other storage and access it using lambda in next step in AWS step functions

like image 143
Sandeep Fatangare Avatar answered Oct 20 '22 06:10

Sandeep Fatangare