Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to list all AWS S3 objects in a bucket using Java

What is the simplest way to get a list of all items within an S3 bucket using Java?

List<S3ObjectSummary> s3objects = s3.listObjects(bucketName,prefix).getObjectSummaries(); 

This example only returns 1000 items.

like image 803
Ron D. Avatar asked Nov 06 '11 13:11

Ron D.


People also ask

How do I find out how many items are in a S3 bucket?

Open the AWS S3 console and click on your bucket's name. In the Objects tab, click the top row checkbox to select all files and folders or select the folders you want to count the files for. Click on the Actions button and select Calculate total size.

How do I view contents of AWS S3 bucket?

To open the overview pane for an objectSign in to the AWS Management Console and open the Amazon S3 console at https://console.aws.amazon.com/s3/ . In the Buckets list, choose the name of the bucket that contains the object. In the Objects list, choose the name of the object for which you want an overview.


2 Answers

It might be a workaround but this solved my problem:

ObjectListing listing = s3.listObjects( bucketName, prefix ); List<S3ObjectSummary> summaries = listing.getObjectSummaries();  while (listing.isTruncated()) {    listing = s3.listNextBatchOfObjects (listing);    summaries.addAll (listing.getObjectSummaries()); } 
like image 97
Ron D. Avatar answered Sep 21 '22 06:09

Ron D.


For those, who are reading this in 2018+. There are two new pagination-hassle-free APIs available: one in AWS SDK for Java 1.x and another one in 2.x.

1.x

There is a new API in Java SDK that allows you to iterate through objects in S3 bucket without dealing with pagination:

AmazonS3 s3 = AmazonS3ClientBuilder.standard().build();  S3Objects.inBucket(s3, "the-bucket").forEach((S3ObjectSummary objectSummary) -> {     // TODO: Consume `objectSummary` the way you need     System.out.println(objectSummary.key); }); 

This iteration is lazy:

The list of S3ObjectSummarys will be fetched lazily, a page at a time, as they are needed. The size of the page can be controlled with the withBatchSize(int) method.

2.x

The API changed, so here is an SDK 2.x version:

S3Client client = S3Client.builder().region(Region.US_EAST_1).build(); ListObjectsV2Request request = ListObjectsV2Request.builder().bucket("the-bucket").prefix("the-prefix").build(); ListObjectsV2Iterable response = client.listObjectsV2Paginator(request);  for (ListObjectsV2Response page : response) {     page.contents().forEach((S3Object object) -> {         // TODO: Consume `object` the way you need         System.out.println(object.key());     }); } 

ListObjectsV2Iterable is lazy as well:

When the operation is called, an instance of this class is returned. At this point, no service calls are made yet and so there is no guarantee that the request is valid. As you iterate through the iterable, SDK will start lazily loading response pages by making service calls until there are no pages left or your iteration stops. If there are errors in your request, you will see the failures only after you start iterating through the iterable.

like image 43
madhead - StandWithUkraine Avatar answered Sep 22 '22 06:09

madhead - StandWithUkraine