Is it possible to trigger an AWS Glue crawler on new files, that get uploaded into a S3 bucket, given that the crawler is "pointed" to that bucket? In other words: a file upload generates an event, that causes AWS Glue crawler to analyse it. I know that there is schedule based crawling, but never found an event- based one.
You can use the AWS CLI or AWS Glue API to configure triggers with both jobs and crawlers. Sign in to the AWS Management Console and open the AWS Glue console at https://console.aws.amazon.com/glue/ . In the navigation pane, under ETL, choose Triggers. Then choose Add trigger.
On the Crawlers tab, select your crawler, and then choose Add. The trigger appears on the graph. On the graph, to the right of the job trigger that you just created, choose Add node. On the Jobs tab, select the job that you want to start when the crawler run completes, and then choose Add.
Sign in to the AWS Management Console and open the Amazon S3 console at https://console.aws.amazon.com/s3/ . In the Buckets list, choose the name of the bucket that you want to enable events for. Choose Properties. Navigate to the Event Notifications section and choose Create event notification.
AWS Glue automatically crawls your Amazon S3 data, identifies data formats, and then suggests schemas for use with other AWS analytic services. This post walks you through the process of using AWS Glue to crawl your data on Amazon S3 and build a metadata store that can be used with other AWS offerings.
Here is a step-by-step guide(link below) for a similar architecture. (Refer the above picture for the architecture)
https://wellarchitectedlabs.com/Cost/Cost_and_Usage_Analysis/300_Automated_CUR_Updates_and_Ingestion/Lab_Guide.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With