I'm planning to write certain jobs in AWS Glue ETL using Pyspark, which I want to get triggered as and when a new file is dropped in an AWS S3 Location, just like we do for triggering AWS Lambda Functions using S3 Events.
But, I see very narrowed down options only, to trigger a Glue ETL script. Any help on this shall be highly appreciated.
The following should work to trigger a Glue job from AWS Lambda. Have the lambda configured to the appropriate S3 bucket, and IAM roles / permissions assigned to AWS Lambda so that lambda can start the AWS Glue job on behalf of the user.
import boto3
print('Loading function')
def lambda_handler(_event, _context):
glue = boto3.client('glue')
gluejobname = "YOUR GLUE JOB NAME"
try:
runId = glue.start_job_run(JobName=gluejobname)
status = glue.get_job_run(JobName=gluejobname, RunId=runId['JobRunId'])
print("Job Status : ", status['JobRun']['JobRunState'])
except Exception as e:
print(e)
raise
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With