Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to keep only Latest "N" number of files/objects in S3 bucket periodically using bash script or any other methods

I am using S3 bucket to store my web application log files. Now I need to know is there any option available, to keep the latest 20 files only, regardless when they are created. I can't use S3 auto expiry option as I always need the latest 20 files inside my bucket.

like image 895
Praveen George Avatar asked Sep 21 '25 00:09

Praveen George


2 Answers

I hope this answer will solve your problem

aws s3 ls s3://your-bucket/ --recursive | sort -k1 | sort -k2 | head -n -30 | awk '{$1=$2=$3=""; print $0}' | sed 's/^[ \t]*//' | while read -r line ; do  
echo "Removing \"${line}\"";                                                                      
    aws s3 rm "s3://your-bucket/${line}";              
done

For more details : https://stackoverflow.com/a/49373909/16885246

like image 110
Kaviyarasu P Avatar answered Sep 22 '25 21:09

Kaviyarasu P


Option 1:-

a) Use S3 Notification Service and trigger lambda for each Put object in S3


b) list the objects in bucket using python boto sdk and store the values(key,date modified) in list .


c) sort the list by date/time and delete the old record data 21st data.

Option 2:-

a) Configure SQS in notification service and trigger lambda for each put object.


b) Schedule a lambda service based on your requirements.


c) list the objects in bucket using python boto sdk and store the values(key,date modified) in list .


d) Sort the list by date/time and delete the old record data 21st data.

Based on your requirements you need to choose the option 1 (or) option 2.

If your file write/read/download in S3 bucket are time intensive operation choose option 1
If your file write/read/download are not time intensive operation choose option 2.

like image 38
Mohan Shanmugam Avatar answered Sep 22 '25 22:09

Mohan Shanmugam