Is it possible to run an airflow task only when a specific event occurs like an event of dropping a file into a specific S3 bucket. Something similar to AWS Lambda events
There is S3KeySensor
but I don't know if it does what I want (to run Task only when an event occurs)
Here is the example to make the question more clear:
I have a sensor object as follows
sensor = S3KeySensor(
task_id='run_on_every_file_drop',
bucket_key='file-to-watch-*',
wildcard_match=True,
bucket_name='my-sensor-bucket',
timeout=18*60*60,
poke_interval=120,
dag=dag
)
Using the above sensor object, airflow behavior for the sensor task is as follows:
my-sensor-bucket
even before the DAG is
switched ON
in airflow admin UI (I don't want to run the task due
to the presence of past s3 objects)my-sensor-bucket
)I'm trying to understand if tasks in airflow can be run only based on scheduling(like cron jobs) or sensors(only once based on sensing criteria) or cant it be setup like an event based pipeline(something similar to AWS Lambda)
Introduction to Cloud Computing on AWS for Beginners [2022] Amazon S3 service is used for file storage, where you can upload or remove files. We can trigger AWS Lambda on S3 when there are any file uploads in S3 buckets. AWS Lambda has a handler function which acts as a start point for AWS Lambda function.
For additional information, see the Configuring S3 Event Notifications section in the Amazon S3 Developer Guide. NOTE: S3 Buckets only support a single notification configuration. Declaring multiple aws.
Airflow is fundamentally organized around time based scheduling.
You can hack around to get what you want though in a few ways:
If you go with route 3, you'll be deleting the keys that passed the sensor before the next run of the DAG and its sensor. Note that due to S3 eventual consistency, the routes 1 & 2 are more reliable.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With