I am using below mentioned code to get list of all file names from s3 bucket. I have two bucket in s3. For one of the bucket below code returns all the file names (more than 1000), but the same code returns only 1000 file names for another bucket. I just don't get what is happening. Why same code running for one bucket and not for other ?
Also my bucket have hierarchy structure folder/filename.jpg.
ObjectListing objects = s3.listObjects("bucket.new.test"); do { for (S3ObjectSummary objectSummary : objects.getObjectSummaries()) { String key = objectSummary.getKey(); System.out.println(key); } objects = s3.listNextBatchOfObjects(objects); } while (objects.isTruncated());
You can copy an object from one bucket to another by using the AmazonS3 client's copyObject method. It takes the name of the bucket to copy from, the object to copy, and the destination bucket name. s3. copyObject(from_bucket, object_key, to_bucket, object_key); } catch (AmazonServiceException e) { System.
By default, you can create up to 100 buckets in each of your AWS accounts. If you need additional buckets, you can increase your account bucket limit to a maximum of 1,000 buckets by submitting a service limit increase.
As AWS notes, “If you store 10,000,000 objects with Amazon S3, you can on average expect to incur a loss of a single object once every 10,000 years.”
Improving on @Abhishek's answer. This code is slightly shorter and variable names are fixed.
You have to get the object listing, add its' contents to the collection, then get the next batch of objects from the listing. Repeat the operation until the listing will not be truncated.
List<S3ObjectSummary> keyList = new ArrayList<S3ObjectSummary>(); ObjectListing objects = s3.listObjects("bucket.new.test"); keyList.addAll(objects.getObjectSummaries()); while (objects.isTruncated()) { objects = s3.listNextBatchOfObjects(objects); keyList.addAll(objects.getObjectSummaries()); }
For Scala developers, here it is recursive function to execute a full scan and map of the contents of an AmazonS3 bucket using the official AWS SDK for Java
import com.amazonaws.services.s3.AmazonS3Client import com.amazonaws.services.s3.model.{S3ObjectSummary, ObjectListing, GetObjectRequest} import scala.collection.JavaConversions.{collectionAsScalaIterable => asScala} def map[T](s3: AmazonS3Client, bucket: String, prefix: String)(f: (S3ObjectSummary) => T) = { def scan(acc:List[T], listing:ObjectListing): List[T] = { val summaries = asScala[S3ObjectSummary](listing.getObjectSummaries()) val mapped = (for (summary <- summaries) yield f(summary)).toList if (!listing.isTruncated) mapped.toList else scan(acc ::: mapped, s3.listNextBatchOfObjects(listing)) } scan(List(), s3.listObjects(bucket, prefix)) }
To invoke the above curried map()
function, simply pass the already constructed (and properly initialized) AmazonS3Client object (refer to the official AWS SDK for Java API Reference), the bucket name and the prefix name in the first parameter list. Also pass the function f()
you want to apply to map each object summary in the second parameter list.
For example
val keyOwnerTuples = map(s3, bucket, prefix)(s => (s.getKey, s.getOwner))
will return the full list of (key, owner)
tuples in that bucket/prefix
or
map(s3, "bucket", "prefix")(s => println(s))
as you would normally approach by Monads in Functional Programming
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With