Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

List files on S3

I'm getting frustrated by not finding any good explanation on how to list all files in a S3 bucket.

I have this bucket with about 20 images on. All I want to do is to list them. Someone says "just use the S3.list-method". But without any special library there is no S3.list-method. I have a S3.get-method, which I dont get to work. Arggh, would appreciate if someone told me how to simply get an list of all files(filenames) from an S3 bucket.

val S3files = S3.get(bucketName: String, path: Option[String], prefix: Option[String], delimiter: Option[String])

returns an Future[Response]

I dont know how to use this S3.get. What would be the easiest way to list all files in my S3 bucket?

Answers much appreciated!

like image 440
malmling Avatar asked Jun 27 '13 11:06

malmling


3 Answers

With Scala you might now want to use Amazon's official SDK for Java which provides the AmazonS3::listObjects method:

import scala.collection.JavaConverters._
import com.amazonaws.services.s3.model.ObjectListing

def keys(bucket: String): List[String] = nextBatch(s3Client.listObjects(bucket))

private def nextBatch(listing: ObjectListing, keys: List[String] = Nil): List[String] = {

  val pageKeys = listing.getObjectSummaries.asScala.map(_.getKey).toList

  if (listing.isTruncated)
    nextBatch(s3Client.listNextBatchOfObjects(listing), pageKeys ::: keys)
  else
    pageKeys ::: keys
}

Note the recursion on ObjectListing objects:

Since the listing of keys in a bucket is done by batch (using a pagination system as documented here), only up to the first 1000 keys would be returned by s3Client.listObjects(bucket).getObjectSummaries.asScala.map(_.getKey).

Thus the recursive call in order to get all keys in a bucket by asking for the next page of keys while ObjectListing::isTruncated is true.

Beware of memory issues if your bucket is huge though.


s3Client can be built as such:

import com.amazonaws.services.s3.{AmazonS3, AmazonS3ClientBuilder}
import com.amazonaws.auth.{AWSStaticCredentialsProvider, BasicAWSCredentials}

val credentials = new BasicAWSCredentials(awsKey, awsAccessKey)
val s3Client: AmazonS3 = AmazonS3ClientBuilder.standard().withCredentials(new AWSStaticCredentialsProvider(credentials)).build()

with these requirements in build.sbt and the latest version:

libraryDependencies ++= Seq(
  "com.amazonaws" % "aws-java-sdk-bom" % "1.11.391",
  "com.amazonaws" % "aws-java-sdk-s3"  % "1.11.391"
)
like image 199
Xavier Guihot Avatar answered Nov 15 '22 08:11

Xavier Guihot


Using the library here:

https://github.com/Rhinofly/play-s3

You should be able to do something like this:

import concurrent.ExecutionContext.Implicits._

val bucket = S3("bucketName")
val result = bucket.list
result.map {
  case Left(error) => throw new Exception("Error: " + x)
  case Right(list) => 
    list.foreach {
        case BucketItem(name, isVirtual) => //...
    }
}

You'll have to tweak this a bit in regards to your credentials, but the examples show how to do that.

like image 24
cmbaxter Avatar answered Nov 15 '22 07:11

cmbaxter


def listS3Files() = Action {
Await.result(S3("bucketName").list, 15 seconds).fold(
{ error => {
  Logger.error("Error")
  Status(INTERNAL_SERVER_ERROR)
}},
  success => {
    Ok(success.seq.toString())
  }
 )
}

Here's my working solution. Thanks to @cmbaxter

like image 39
malmling Avatar answered Nov 15 '22 08:11

malmling