I've got a very large bucket (hundreds of thousands of objects). I've got a path (lets say s3://myBucket/path1/path2). /path2 gets uploads that are also folders. So a sample might look like:
s3://myBucket/path1/path2/v6.1.0
s3://myBucket/path1/path2/v6.1.1
s3://myBucket/path1/path2/v6.1.102
s3://myBucket/path1/path2/v6.1.2
s3://myBucket/path1/path2/v6.1.25
s3://myBucket/path1/path2/v6.1.99
S3 doesn't take into account version number sorting (which makes sense) but alphabetically the last in the list is not the last uploaded. In that example .../v6.1.102 is the newest.
Here's what I've got so far:
aws s3api list-objects
--bucket myBucket
--query "sort_by(Contents[?contains(Key, \`path1/path2\`)],&LastModified)"´
--max-items 20000
So one problem here is max-items seems to start alphabetically from the all files recursively in the bucket. 20000 does get to my files but it's a pretty slow process to go through that many files.
So my questions are twofold:
1 - This is still searching the whole bucket but I just want to narrow it down to path2/ . Can I do this?
2 - This lists just objects, is it possible to pull up just a path list instead?
Basically the end goal is I just want a command to return the newest folder name like 'v6.1.102' from the example above.
To answer #1, you could add the --prefix path1/path2
to limit what you're querying in the bucket.
In terms of sorting by last modified, I can only think of using an SDK to combine the list_objects_v2
and head_object
(boto3) to get last modified on the objects and programmatically sort
Update
Alternatively, you could reverse sort by LastModified
in jmespath and return the first item to give you the most recent object and gather the directory from there.
aws s3api list-objects-v2 \
--bucket myBucket \
--prefix path1/path2 \
--query 'reverse(sort_by(Contents,&LastModified))[0]'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With