Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Amazon s3 returns only 1000 entries for one bucket and all for another bucket (using java sdk)?

Tags:

I am using below mentioned code to get list of all file names from s3 bucket. I have two bucket in s3. For one of the bucket below code returns all the file names (more than 1000), but the same code returns only 1000 file names for another bucket. I just don't get what is happening. Why same code running for one bucket and not for other ?

Also my bucket have hierarchy structure folder/filename.jpg.

ObjectListing objects = s3.listObjects("bucket.new.test"); do {     for (S3ObjectSummary objectSummary : objects.getObjectSummaries()) {         String key = objectSummary.getKey();         System.out.println(key);     }     objects = s3.listNextBatchOfObjects(objects); } while (objects.isTruncated()); 
like image 461
Abhishek Avatar asked Oct 12 '12 06:10

Abhishek


People also ask

How do I move files from one S3 bucket to another in Java?

You can copy an object from one bucket to another by using the AmazonS3 client's copyObject method. It takes the name of the bucket to copy from, the object to copy, and the destination bucket name. s3. copyObject(from_bucket, object_key, to_bucket, object_key); } catch (AmazonServiceException e) { System.

How many S3 buckets can I have per account by default?

By default, you can create up to 100 buckets in each of your AWS accounts. If you need additional buckets, you can increase your account bucket limit to a maximum of 1,000 buckets by submitting a service limit increase.

How often can you expect to lose data if you store 10000000 objects in S3?

As AWS notes, “If you store 10,000,000 objects with Amazon S3, you can on average expect to incur a loss of a single object once every 10,000 years.”


2 Answers

Improving on @Abhishek's answer. This code is slightly shorter and variable names are fixed.

You have to get the object listing, add its' contents to the collection, then get the next batch of objects from the listing. Repeat the operation until the listing will not be truncated.

List<S3ObjectSummary> keyList = new ArrayList<S3ObjectSummary>(); ObjectListing objects = s3.listObjects("bucket.new.test"); keyList.addAll(objects.getObjectSummaries());  while (objects.isTruncated()) {     objects = s3.listNextBatchOfObjects(objects);     keyList.addAll(objects.getObjectSummaries()); } 
like image 63
oferei Avatar answered Oct 09 '22 03:10

oferei


For Scala developers, here it is recursive function to execute a full scan and map of the contents of an AmazonS3 bucket using the official AWS SDK for Java

import com.amazonaws.services.s3.AmazonS3Client import com.amazonaws.services.s3.model.{S3ObjectSummary, ObjectListing, GetObjectRequest} import scala.collection.JavaConversions.{collectionAsScalaIterable => asScala}  def map[T](s3: AmazonS3Client, bucket: String, prefix: String)(f: (S3ObjectSummary) => T) = {    def scan(acc:List[T], listing:ObjectListing): List[T] = {     val summaries = asScala[S3ObjectSummary](listing.getObjectSummaries())     val mapped = (for (summary <- summaries) yield f(summary)).toList      if (!listing.isTruncated) mapped.toList     else scan(acc ::: mapped, s3.listNextBatchOfObjects(listing))   }    scan(List(), s3.listObjects(bucket, prefix)) } 

To invoke the above curried map() function, simply pass the already constructed (and properly initialized) AmazonS3Client object (refer to the official AWS SDK for Java API Reference), the bucket name and the prefix name in the first parameter list. Also pass the function f() you want to apply to map each object summary in the second parameter list.

For example

val keyOwnerTuples = map(s3, bucket, prefix)(s => (s.getKey, s.getOwner)) 

will return the full list of (key, owner) tuples in that bucket/prefix

or

map(s3, "bucket", "prefix")(s => println(s)) 

as you would normally approach by Monads in Functional Programming

like image 38
pangiole Avatar answered Oct 09 '22 04:10

pangiole