Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the best strategy to monitor S3 bucket for missing a new file in the last x hours?

I have a use case where some process puts a file every 6 hours to an S3 bucket. This bucket has already thousands of files in it and I wanted to generate an sns alert or something if no new file is added in the last 7 hours. what would be a reasonable approach? Thanks

like image 908
dawit Avatar asked Dec 08 '22 14:12

dawit


1 Answers

There are a few potential approaches:

  • Check the bucket every few minuter
  • Keep track of the last new file
  • Use an Amazon CloudWatch Alarm

Check the bucket every few minutes

Configure Amazon CloudWatch Events to trigger an AWS Lambda function every few minutes (depending upon how quickly you want it reported), which obtains a listing of the bucket and check the timestamp that the last object was added. If it is more than 7 hours, send the alert.

This approach is very simple but is doing a lot of work every few minutes, including during the 7 hours after an object was added. Plus, if you have lots of objects, this can consume a lot of Lambda time and API calls.

Keep track of the last new file

  • Configure an Event on the Amazon S3 bucket to trigger an AWS Lambda function whenever a new file is added to the bucket. Store the current time in a DynamoDB table (or, if you really want to save costs, store it in the Systems Manager Parameter Store or an S3 object in another bucket). This will update the date whenever a new file is added.
  • Configure Amazon CloudWatch Events to trigger an AWS Lambda function every few minutes (depending upon how quickly you want it reported) that checks the "last updated date" in DynamoDB (or where ever it was stored). If it is more than 7 hours, trigger an alert.

While this approach has more components, it is actually a simpler solution because it never has to look through the list of objects in S3. Instead, it just remembers when the last object was added.

You could come up with an even smarter method that, instead of checking every few minutes, schedules an alert function in 7 hours time. Whenever a new file is added, it changes the schedule to put it 7 hours away again. It's like constantly delaying a dentist appointment. :)

Use an Amazon CloudWatch Alarm

This is a simpler method that uses a CloudWatch Alarm to trigger the notification.

  • Configure the S3 bucket to trigger a Lambda function whenever an object is added. The Lambda function sends a Custom Metric to Amazon CloudWatch.
  • Create a CloudWatch Alarm to trigger a notification whenever the SUM of the Custom Metric is zero for the past 6 hours. Also configure it to trigger if the Alarm enters the INSUFFICIENT_DATA state, so that it correctly triggers when no data is sent (which is more likely than a metric of zero since the Lambda function won't send data when no objects are created).

The only downside is that the alarm period only has a few options. It can be set for 6 hours, but I don't think it can be set for 7 hours.

How to alert

As to how to alert somebody, sending a message to an Amazon SNS topic is a good idea. People could subscribe via Email, SMS and various other methods.

like image 183
John Rotenstein Avatar answered Apr 08 '23 13:04

John Rotenstein