I am using below mentioned code to get list of all file names from s3 bucket. I have two bucket in s3. For one of the bucket below code returns all the file names (more than 1000), but the same code returns only 1000 file names for another bucket. I just don't get what is happening. Why same code running for one bucket and not for other ? Also my bucket have hierarchy structure folder/filename.jpg. <pre class="prettyprint"><code>ObjectListing objects = s3.listObjects("bucket.new.test"); do { for (S3ObjectSummary objectSummary : objects.getObjectSummaries()) { String key = objectSummary.getKey(); System.out.println(key); } objects = s3.listNextBatchOfObjects(objects); } while (objects.isTruncated()); </code></pre>

Improving on @Abhishek's answer. This code is slightly shorter and variable names are fixed. <blockquote> You have to get the object listing, add its' contents to the collection, then get the next batch of objects from the listing. Repeat the operation until the listing will not be truncated. </blockquote> <pre class="prettyprint"><code>List<S3ObjectSummary> keyList = new ArrayList<S3ObjectSummary>(); ObjectListing objects = s3.listObjects("bucket.new.test"); keyList.addAll(objects.getObjectSummaries()); while (objects.isTruncated()) { objects = s3.listNextBatchOfObjects(objects); keyList.addAll(objects.getObjectSummaries()); } </code></pre>

Amazon s3 returns only 1000 entries for one bucket and all for another bucket (using java sdk)?

Tags:

I am using below mentioned code to get list of all file names from s3 bucket. I have two bucket in s3. For one of the bucket below code returns all the file names (more than 1000), but the same code returns only 1000 file names for another bucket. I just don't get what is happening. Why same code running for one bucket and not for other ?

Also my bucket have hierarchy structure folder/filename.jpg.

ObjectListing objects = s3.listObjects("bucket.new.test"); do {     for (S3ObjectSummary objectSummary : objects.getObjectSummaries()) {         String key = objectSummary.getKey();         System.out.println(key);     }     objects = s3.listNextBatchOfObjects(objects); } while (objects.isTruncated());

461

asked Oct 12 '12 06:10

Abhishek

2 Answers

Improving on @Abhishek's answer. This code is slightly shorter and variable names are fixed.

You have to get the object listing, add its' contents to the collection, then get the next batch of objects from the listing. Repeat the operation until the listing will not be truncated.

List<S3ObjectSummary> keyList = new ArrayList<S3ObjectSummary>(); ObjectListing objects = s3.listObjects("bucket.new.test"); keyList.addAll(objects.getObjectSummaries());  while (objects.isTruncated()) {     objects = s3.listNextBatchOfObjects(objects);     keyList.addAll(objects.getObjectSummaries()); }

answered Oct 09 '22 03:10

oferei

For Scala developers, here it is recursive function to execute a full scan and map of the contents of an AmazonS3 bucket using the official AWS SDK for Java

import com.amazonaws.services.s3.AmazonS3Client import com.amazonaws.services.s3.model.{S3ObjectSummary, ObjectListing, GetObjectRequest} import scala.collection.JavaConversions.{collectionAsScalaIterable => asScala}  def map[T](s3: AmazonS3Client, bucket: String, prefix: String)(f: (S3ObjectSummary) => T) = {    def scan(acc:List[T], listing:ObjectListing): List[T] = {     val summaries = asScala[S3ObjectSummary](listing.getObjectSummaries())     val mapped = (for (summary <- summaries) yield f(summary)).toList      if (!listing.isTruncated) mapped.toList     else scan(acc ::: mapped, s3.listNextBatchOfObjects(listing))   }    scan(List(), s3.listObjects(bucket, prefix)) }

To invoke the above curried map() function, simply pass the already constructed (and properly initialized) AmazonS3Client object (refer to the official AWS SDK for Java API Reference), the bucket name and the prefix name in the first parameter list. Also pass the function f() you want to apply to map each object summary in the second parameter list.

For example

val keyOwnerTuples = map(s3, bucket, prefix)(s => (s.getKey, s.getOwner))

will return the full list of (key, owner) tuples in that bucket/prefix

map(s3, "bucket", "prefix")(s => println(s))

as you would normally approach by Monads in Functional Programming

answered Oct 09 '22 04:10

pangiole

Related questions
                            
                                How do I use a X509 certificate with PyCrypto?
                            
                                What’s the purpose of mmap memory protection PROT_NONE
                            
                                How to change the value of an attribute in an XML document?
                            
                                Getting (415) Unsupported Media Type error
                            
                                Yii: validation rules that always apply except one scenario
                            
                                Detect if TextView spans over 2 lines
                            
                                Plot 2-dimensional NumPy array using specific columns
                            
                                How can I check dhcpd.conf against syntax error without running dhcpd?
                            
                                Add constraint not listed in Editor menu in Xcode
                            
                                debugging with visual studio using redirected standard input
                            
                                Estimate Brightness of an image Opencv
                            
                                How to validate only part of the model in ASP .NET MVC?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With