Is there a way to filter s3 objects by last modified date in boto3? I've constructed a large text file list of all the contents in a bucket. Some time has passed and I'd like to list only objects that were added after the last time I looped through the entire bucket.
I know I can use the Marker
property to start from a certain object name,so I could give it the last object I processed in the text file but that does not guarantee a new object wasn't added before that object name. e.g. if the last file in the text file was oak.txt and a new file called apple.txt was added, it would not pick that up.
s3_resource = boto3.resource('s3')
client = boto3.client('s3')
def list_rasters(bucket):
bucket = s3_resource.Bucket(bucket)
for bucket_obj in bucket.objects.filter(Prefix="testing_folder/"):
print bucket_obj.key
print bucket_obj.last_modified
The following code snippet gets all objects under specific folder and check if the file last modified is created after the time you specify :
Replace YEAR,MONTH, DAY
with your values.
import boto3
import datetime
#bucket Name
bucket_name = 'BUCKET NAME'
#folder Name
folder_name = 'FOLDER NAME'
#bucket Resource
s3 = boto3.resource('s3')
bucket = s3.Bucket(bucket_name)
def lambda_handler(event, context):
for file in bucket.objects.filter(Prefix= folder_name):
#compare dates
if (file.last_modified).replace(tzinfo = None) > datetime.datetime(YEAR,MONTH, DAY,tzinfo = None):
#print results
print('File Name: %s ---- Date: %s' % (file.key,file.last_modified))
The code snippet below will use the s3 Object class get() action to only return those that meet a IfModifiedSince datetime argument. The script prints the files, which was the original questions, but also saves the files locally.
import boto3
import io
from datetime import date, datetime, timedelta
# Defining AWS S3 resources
s3 = boto3.resource('s3')
bucket = s3.Bucket('<bucket_name>')
prefix = '<object_key_prefix, if any>'
# note this based on UTC time
yesterday = datetime.fromisoformat(str(date.today() - timedelta(days=1)))
# function to retrieve Streaming Body from S3 with timedelta argument
def get_object(file_name):
try:
obj = file_name.get(IfModifiedSince=yesterday)
return obj['Body']
except:
False
# obtain a list of s3 Objects with prefix filter
files = list(bucket.objects.filter(Prefix=prefix))
# Iterating through the list of files
# Loading streaming body into a file with the same name
# Printing file name and saving file
# Note skipping first file since it's only the directory
for file in files[1:]:
file_name = file.key.split(prefix)[1] # getting the file name of the S3 object
s3_file = get_object(file) # streaming body needing to iterate through
if s3_file: # meets the modified by date
print(file_name) # prints files not modified since timedelta
try:
with io.FileIO(file_name, 'w') as f:
for i in s3_file: # iterating though streaming body
f.write(i)
except TypeError as e:
print(e, file)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With