Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Waiting on Crawlers and Jobs as Dependencies for Glue Job Triggers

Tags:

aws-glue

I'm trying to figure out how to orchestrate a job that has upstream dependencies on crawlers as well as glue jobs.

Currently, AWS Glue Job Triggers support completion of other jobs, but not crawlers. If I wanted a job to execute after PrevJobA and CrawlerB finished, has anyone found a good way to do so?

From another question, it appears that crawlers emit CloudWatch Events. Is it possible for the crawler to fake being a job by sending an event with the lambda? How to kick off AWS Glue Job when Crawler Completes

like image 617
user2740775 Avatar asked Mar 05 '23 23:03

user2740775


1 Answers

Unfortunately there is no built-in option to set dependencies between Glue crawlers and jobs. However, you can orchestrate it using StepFunction and Lambdas or automate with CloudWatch events and Lambdas.

The first one is more flexible and clear since you are building a workflow with steps of any complexity which you can monitor. Triggering of crawlers and jobs happens by via AWS SDK by calling Glue API. BTW, recently AWS announced native support of Glue jobs invocations so it eliminates a need to have one or two Lambdas.

With CloudWatch events some simple cases can be implemented (like trigger a job when crawler completes). CloudWatch rule can be created in the same way as for any other type of CW events, you just need to select appropriate event type (see events for "detail-type":"Glue Crawler State Change"). With this approach it's not very convenient to monitor what's currently happening visually however it's still good solution for simple cases.

Besides that you can combine these two approaches so that crawler is triggered by Glue according to defined schedule, CloudWatch rule triggers Lambda when receives "Succeeded" event from Glue Crawler and then Lambda triggers a StepFunction which starts ETL jobs in proper order.

like image 195
Yuriy Bondaruk Avatar answered May 16 '23 06:05

Yuriy Bondaruk