Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get all versions of an object in an AWS S3 bucket?

I've enabled object versioning on a bucket. I want to get all versions of a key inside that bucket. But I cannot find a method go do this; how would one accomplish this using the S3 APIs?

like image 977
ZZB Avatar asked May 27 '14 05:05

ZZB


People also ask

How many versions can a S3 object have?

For more information, see Using versioning in S3 buckets. The examples in this section show how to retrieve an object listing from a versioning-enabled bucket. Each request returns up to 1,000 versions, unless you specify a lower number.

Can a S3 bucket store multiple versions of the same file?

You can use S3 Versioning to keep multiple versions of an object in one bucket so that you can restore objects that are accidentally deleted or overwritten.

Does S3 have versioning?

Versioning in Amazon S3 is a means of keeping multiple variants of an object in the same bucket. You can use the S3 Versioning feature to preserve, retrieve, and restore every version of every object stored in your buckets.

Does S3 outposts support object versioning?

Amazon S3 on Outposts now supports versioning, helping you to locally preserve, retrieve, and restore each version of every object stored in your buckets. Versioning objects makes it easier to recover from both unintended user actions and application failures.


1 Answers

So, I ran into this brick wall this morning. This seemingly trivial thing is incredibly difficult to do, it turns out.

The API you want is the GET Bucket Object versions API, but it is sadly non-trivial to use.

First, you have to steer clear of some non-solutions: KeyMarker, which is documented by boto3 as,

KeyMarker (string) -- Specifies the key to start with when listing objects in a bucket.

…does not start with the specified key when listing objects in a bucket; rather, it starts immediately after that key, which makes it somewhat useless here.

The best restriction this API provides is Prefix; this isn't going to be perfect, since there could be keys that are not our key of interest that nonetheless contain our key.

Also beware of MaxKeys; it is tempting to think that, lexicographically, our key should be first, and all keys which have our key as a prefix of their key name would follow, so we could trim them using MaxKeys; sadly, MaxKeys controls not how many keys are returned in the response, but rather the number of versions. (And I'm going to presume that isn't known in advance.)

So, Prefix is the best it seems that can be done. Also note that, at least in some languages, the client library will not handle pagination for you, so you'll additionally need to deal with that.

As an example in boto3:

response = client.list_object_versions(
    Bucket=bucket_name, Prefix=key_name,
)
while True:
    # Process `response`
    ...
    # Check if the results got paginated:
    if response['IsTruncated']:
        response = client.list_object_versions(
            Bucket=bucket_name, Prefix=key_name,
            KeyMarker=response['NextKeyMarker'],
            VersionIdMarker=response['NextVersionIdMarker'],
        )
    else:
       break
like image 156
Thanatos Avatar answered Oct 09 '22 20:10

Thanatos