Currently I'm using an AWS Glue job to load data into RedShift, but after that load I need to run some data cleansing tasks probably using an AWS Lambda function. Is there any way to trigger a Lambda function at the end of a Glue job? Lambda functions can be triggered using SNS messages, but I couldn't find a way to send an SNS at the end of the Glue job.
You can trigger a Lambda function on DynamoDB table updates by subscribing your Lambda function to the DynamoDB Stream associated with the table. You can associate a DynamoDB Stream with a Lambda function using the Amazon DynamoDB console, the AWS Lambda console, or Lambda's registerEventSource API.
@oreoluwa is right, this can be done using Cloudwatch Events.
From the Cloudwatch dashboard:
The event you'll get in Lambda will be of the format:
{
'version': '0',
'id': 'a9bc90be-xx00-03e0-9bc5-a0a0a0a0a0a0',
'detail-type': 'GlueJobStateChange',
'source': 'aws.glue',
'account': 'xxxxxxxxxx',
'time': '2018-05-10T16: 17: 03Z',
'region': 'us-east-2',
'resources': [],
'detail': {
'jobName': 'xxxx_myjobname_yyyy',
'severity': 'INFO',
'state': 'SUCCEEDED',
'jobRunId': 'jr_565465465446788dfdsdf546545454654546546465454654',
'message': 'Jobrunsucceeded'
}
}
Since AWS Glue has started supporting python, you can probably follow the below path to achieve what you desire. Below sample script shows how to do that -
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
import boto3 ## Step-2
## @params: [JOB_NAME]
args = getResolvedOptions(sys.argv, ['JOB_NAME'])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)
## Do all ETL stuff here
## Once the ETL completes
lambda_client = boto3.client('lambda') ## Step-3
response = lambda_client.invoke(FunctionName='string') ## Step-4
Please make sure that the role that you are using while creating the Glue job has permissions to invoke lambda functions.
Refer to the Boto3 documentation for lambda here.
No. Currently you can't trigger a lambda function at the end of a Glue job. The reason for this is that this trigger has not yet been provided by AWS in Lambda. If you look at the list of AWS lambda triggers after you create a lambda function, you will see that it has most of AWS services as trigger but not AWS Glue. So, for now, it is not possible but maybe in future.
But I would like to mention that you can actually control the flow of glue scripts using your lambda python script. (I did it using python, I am sure there may be other languages supporting this). My use case was that whenever I upload any object in S3 bucket, it gets lambda function trigger from which I was reading the object file and starting my glue job. And once the status of Glue job was complete, I would write my file back to S3 bucket linked to this Lambda function.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With