I was trying to figure out a way to clean up my s3 bucket. I want to delete all the keys that are older than X days ( In my case X is 30 days).
I couldn't figure out a way to delete the objects in s3.
I used the following approaches, none of which worked (By worked, I mean I tried getting the object after X days, and s3 was still serving the object. I was expecting "Object not found" or "Object expired" message
Approach 1:
k = Key(bucket)
k.key = my_key_name
expires = datetime.utcnow() + timedelta(seconds=(10))
expires = expires.strftime("%a, %d %b %Y %H:%M:%S GMT")
k.set_contents_from_filename(filename,headers={'Expires':expires})
Approach 2:
k = Key(bucket)
k.key = "Event_" + str(key_name) + "_report"
expires = datetime.utcnow() + timedelta(seconds=(10))
expires = expires.strftime("%a, %d %b %Y %H:%M:%S GMT")
k.set_meta_data('Expires', expires)
k.set_contents_from_filename(filename)
If anyone can share the code that was working for them, which deletes s3 objects, that would be really great
You can use lifecycle policies to delete objects from s3 that are older than X days. For example, suppose you have these objects:
logs/first
logs/second
logs/third
otherfile.txt
To expire everything under logs/ after 30 days, you'd say:
import boto
from boto.s3.lifecycle import (
Lifecycle,
Expiration,
)
lifecycle = Lifecycle()
lifecycle.add_rule(
'rulename',
prefix='logs/',
status='Enabled',
expiration=Expiration(days=30)
)
s3 = boto.connect_s3()
bucket = s3.get_bucket('boto-lifecycle-test')
bucket.configure_lifecycle(lifecycle)
You can also retrieve the lifecycle configuration:
>>> config = bucket.get_lifecycle_config()
>>> print(config[0])
<Rule: ruleid>
>>> print(config[0].prefix)
logs/
>>> print(config[0].expiration)
<Expiration: in: 30 days>
The answer by jamesis is using boto
which is an older version and will be deprecated.
The current supported version is boto3
.
The same expiration policy on the logs folder can be done as follows:
import boto3
from botocore.exceptions import ClientError
client = boto3.client('s3')
try:
policy_status = client.put_bucket_lifecycle_configuration(
Bucket='boto-lifecycle-test',
LifecycleConfiguration={
'Rules':
[
{
'Expiration':
{
'Days': 30,
'ExpiredObjectDeleteMarker': True
},
'Prefix': 'logs/',
'Filter': {
'Prefix': 'logs/',
},
'Status': 'Enabled',
}
]})
except ClientError as e:
print("Unable to apply bucket policy. \nReason:{0}".format(e))
This will override any existing lifecycle configuration policy on logs
.
A good thing to do would be to check if the bucket exists and if you have the permissions to access it before applying the expiration configuration i.e. before the try-except
bucket_exists = client.head_bucket(
Bucket='boto-lifecycle-test'
)
Since the logs
folder itself isn't a bucket but rather an object within the bucket boto-lifecycletest
, the bucket itself can have a different expiration policy.
You can check this from the result in policy_exists
as below.
policy_exists = client.get_bucket_lifecycle_configuration(
Bucket='boto-lifecycle-test')
bucket_policy = policy_exists['Rules'][0]['Expiration']
More information about setting the expiration policy can be checked at Expiry policy
The above python script by Vaulstein throws a Malformed XML exception. Please remove extra "," at the end of line "'Status': 'Enabled',".
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With