I use boto S3 API in my python script which slowly copies data from S3 to my local filesystem. The script worked well for a couple of days, but now there is a problem.
I use the following API function to obtain the list of keys in "directory":
keys = bucket.get_all_keys(prefix=dirname)
And this function (get_all_keys
) does not always return the full list of keys, I mean I can see more keys through AWS web-interface or via aws s3 ls s3://path
.
Reproduced the issue on versions 2.15 and 2.30.
Maybe boto cached some of my requests to S3 (I repeat same requests over and over again)? How to resolve this issue, any suggestions?
There is an easier way. The Bucket
object itself can act as an iterator and it knows how to handle paginated responses. So, if there are more results available, it will automatically fetch them behind the scenes. So, something like this should allow you to iterate over all of the objects in your bucket:
for key in bucket:
# do something with your key
If you want to specify a prefix and get a listing of all keys starting with that prefix, you can do it like this:
for key in bucket.list(prefix='foobar'):
# do something with your key
Or, if you really, really want to build up a list of objects, just do this:
keys = [k for k in bucket]
Note, however, that buckets can hold an unlimited number of keys so be careful with this because it will build a list of all keys in memory.
Just managed to get it working!
It turned out that I had 1013 keys in my directory on S3 and get_all_keys
can return only 1000 keys due to AWS API restrictions.
The solution is simple, just use more high-level function without delimiter
parameter:
keys = list(bucket.list(prefix=dirname))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With