Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Quick way to list all files in Amazon S3 bucket?

Tags:

amazon-s3

People also ask

How do I view contents of a S3 bucket?

To open the overview pane for an objectSign in to the AWS Management Console and open the Amazon S3 console at https://console.aws.amazon.com/s3/ . In the Buckets list, choose the name of the bucket that contains the object. In the Objects list, choose the name of the object for which you want an overview.

How do you list items in a bucket?

To get a list of objects within a bucket, use the AmazonS3 client's listObjects method, supplying the name of a bucket. The listObjects method returns an ObjectListing object that provides information about the objects in the bucket.

What CLI command will list all of the S3 buckets you have access to?

Description. Returns a list of all buckets owned by the authenticated sender of the request. To use this operation, you must have the s3:ListAllMyBuckets permission. See 'aws help' for descriptions of global parameters.

How can you download an S3 bucket including all folders and files?

To download an entire bucket to your local file system, use the AWS CLI sync command, passing it the s3 bucket as a source and a directory on your file system as a destination, e.g. aws s3 sync s3://YOUR_BUCKET . . The sync command recursively copies the contents of the source to the destination.


I'd recommend using boto. Then it's a quick couple of lines of python:

from boto.s3.connection import S3Connection

conn = S3Connection('access-key','secret-access-key')
bucket = conn.get_bucket('bucket')
for key in bucket.list():
    print(key.name.encode('utf-8'))

Save this as list.py, open a terminal, and then run:

$ python list.py > results.txt

AWS CLI

Documentation for aws s3 ls

AWS have recently release their Command Line Tools. This works much like boto and can be installed using sudo easy_install awscli or sudo pip install awscli

Once you have installed, you can then simply run

aws s3 ls

Which will show you all of your available buckets

CreationTime Bucket
       ------------ ------
2013-07-11 17:08:50 mybucket
2013-07-24 14:55:44 mybucket2

You can then query a specific bucket for files.

Command:

aws s3 ls s3://mybucket

Output:

Bucket: mybucket
Prefix:

      LastWriteTime     Length Name
      -------------     ------ ----
                           PRE somePrefix/
2013-07-25 17:06:27         88 test.txt

This will show you all of your files.


s3cmd is invaluable for this kind of thing

$ s3cmd ls -r s3://yourbucket/ | awk '{print $4}' > objects_in_bucket


Be carefull, amazon list only returns 1000 files. If you want to iterate over all files you have to paginate the results using markers :

In ruby using aws-s3

bucket_name = 'yourBucket'
marker = ""

AWS::S3::Base.establish_connection!(
  :access_key_id => 'your_access_key_id',
  :secret_access_key => 'your_secret_access_key'
)

loop do
  objects = Bucket.objects(bucket_name, :marker=>marker, :max_keys=>1000)
  break if objects.size == 0
  marker = objects.last.key

  objects.each do |obj|
      puts "#{obj.key}"
  end
end

end

Hope this helps, vincent


Update 15-02-2019:

This command will give you a list of all buckets in AWS S3:

aws s3 ls

This command will give you a list of all top-level objects inside an AWS S3 bucket:

aws s3 ls bucket-name

This command will give you a list of ALL objects inside an AWS S3 bucket:

aws s3 ls bucket-name --recursive

This command will place a list of ALL inside an AWS S3 bucket... inside a text file in your current directory:

aws s3 ls bucket-name --recursive | cat >> file-name.txt


There are couple of ways you can go about it. Using Python

import boto3

sesssion = boto3.Session(aws_access_key_id, aws_secret_access_key)

s3 = sesssion.resource('s3')

bucketName = 'testbucket133'
bucket = s3.Bucket(bucketName)

for obj in bucket.objects.all():
    print(obj.key)

Another way is using AWS cli for it

aws s3 ls s3://{bucketname}
example : aws s3 ls s3://testbucket133

For Scala developers, here it is recursive function to execute a full scan and map the contents of an AmazonS3 bucket using the official AWS SDK for Java

import com.amazonaws.services.s3.AmazonS3Client
import com.amazonaws.services.s3.model.{S3ObjectSummary, ObjectListing, GetObjectRequest}
import scala.collection.JavaConversions.{collectionAsScalaIterable => asScala}

def map[T](s3: AmazonS3Client, bucket: String, prefix: String)(f: (S3ObjectSummary) => T) = {

  def scan(acc:List[T], listing:ObjectListing): List[T] = {
    val summaries = asScala[S3ObjectSummary](listing.getObjectSummaries())
    val mapped = (for (summary <- summaries) yield f(summary)).toList

    if (!listing.isTruncated) mapped.toList
    else scan(acc ::: mapped, s3.listNextBatchOfObjects(listing))
  }

  scan(List(), s3.listObjects(bucket, prefix))
}

To invoke the above curried map() function, simply pass the already constructed (and properly initialized) AmazonS3Client object (refer to the official AWS SDK for Java API Reference), the bucket name and the prefix name in the first parameter list. Also pass the function f() you want to apply to map each object summary in the second parameter list.

For example

val keyOwnerTuples = map(s3, bucket, prefix)(s => (s.getKey, s.getOwner))

will return the full list of (key, owner) tuples in that bucket/prefix

or

map(s3, "bucket", "prefix")(s => println(s))

as you would normally approach by Monads in Functional Programming