Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I get only the latest file/files created/modified on S3 location through python

using boto i tried the below code :

from boto.s3.connection import S3Connection
conn = S3Connection('XXX', 'YYYY')

bucket = conn.get_bucket('myBucket')

file_list = bucket.list('just/a/prefix/')

but am unable to get the length of the list or the last element of the file_list as it is a BucketListResultSet type ,please suggest a solution for this scenario

like image 274
meenakshi Avatar asked Mar 31 '16 20:03

meenakshi


People also ask

How do I find the last modified date of a Galaxy S3?

Use s3. list_objects_v2(Bucket=Your_bucket_name) to list the objects then get the key LastModified from the contents.

How do I update metadata for an existing Amazon S3 file?

You can set object metadata in Amazon S3 at the time you upload the object. Object metadata is a set of name-value pairs. After you upload the object, you cannot modify object metadata. The only way to modify object metadata is to make a copy of the object and set the metadata.


2 Answers

You are trying to use boto library, which is rather obsolete and not maintained. The number of issues with this library is growing.

Better use currently developed boto3.

First, let us define parameters of our search:

>>> bucket_name = "bucket_of_m"
>>> prefix = "region/cz/"

Do import boto3 and create s3 representing S3 resource:

>>> import boto3
>>> s3 = boto3.resource("s3")

Get the bucket:

>>> bucket = s3.Bucket(name=bucket_name)
>>> bucket
s3.Bucket(name='bucket_of_m')

Define filter for objects with given prefix:

>>> res = bucket.objects.filter(Prefix=prefix)
>>> res
s3.Bucket.objectsCollection(s3.Bucket(name='bucket_of_m'), s3.ObjectSummary)

and iterate over it:

>>> for obj in res:
...     print obj.key
...     print obj.size
...     print obj.last_modified
...

Each obj is ObjectSummary (not Object itself), but it holds enought to learn something about it

>>> obj
s3.ObjectSummary(bucket_name='bucket_of_m', key=u'region/cz/Ostrava/Nadrazni.txt')
>>> type(obj)
boto3.resources.factory.s3.ObjectSummary

You can get Object from it and use it as you need:

>>> o = obj.Object()
>>> o
s3.Object(bucket_name='bucket_of_m', key=u'region/cz/rodos/fusion/AdvancedDataFusion.xml')

There are not so many options for filtering, but prefix is available.

like image 130
Jan Vlcinsky Avatar answered Nov 07 '22 05:11

Jan Vlcinsky


As an addendum to Jan's answer:


Seems that the boto3 library has changed in the meantime and currently (version 1.6.19 at the time of writing) offers more parameters for the filter method:

object_summary_iterator = bucket.objects.filter(
    Delimiter='string',
    EncodingType='url',
    Marker='string',
    MaxKeys=123,
    Prefix='string',
    RequestPayer='requester'
)

Three useful parameters to limit the number of entries for your scenario are Marker, MaxKeys and Prefix:

Marker (string) -- Specifies the key to start with when listing objects in a bucket.
MaxKeys (integer) -- Sets the maximum number of keys returned in the response. The response might contain fewer keys but will never contain more.
Prefix (string) -- Limits the response to keys that begin with the specified prefix.

Two notes:

  • The key you specify for Marker will not be included in the result, i.e. the listing starts from the key following the one you specify as Marker.

  • The boto3 library is performing automatic pagination on the results. The size of each page is determined by the MaxKeys parameter of the filter function (defaulting to 1000).

    If you iterate over the s3.Bucket.objectsCollection object for more than that, it will automatically download the next page. While this is generally useful, it might be surprising when you specify e.g. MaxKeys=10 and want to iterate only over the 10 keys, yet the iterator will go over all matched keys, just with a new request to server each 10 keys.
    So, if you just want e.g. the first three results, break off the loop manually, don't rely on the iterator.

    (Unfortunately this is not clear in the docs (it's actually quite wrong), as the library parameter description is copied from the API parameter description, where it actually makes sense: "The response might contain fewer keys but will never contain more.")

like image 3
Czechnology Avatar answered Nov 07 '22 03:11

Czechnology