Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python - List files and folders in Bucket

I am playing around with the boto library to access an amazon s3 bucket. I am trying to list all the file and folders in a given folder in the bucket. I use this to get all the file and folders:

for key in bucket.list():
    print key.name

This gives me all the files and folders within the root , along with the sub-folders it has the files within them, like this:

root/
 file1
 file2
 folder1/file3
 folder1/file4
 folder1/folder2/file5
 folder1/folder2/file6

How can I list only the contents of say folder1, where in it will list something like:

files:
 file3
 file4

folders:
 folder2

I can navigate to a folder using the

for key in in bucket.list(prefix=path/to/folder/)

but in that case it lists the files in folder2 as files of folder1 because I am trying you use string manipuations on the bucket path. I have tried every scenario and it still breaks in case there are longer paths and when folders have multiple files and folders( and these folders have more files). Is there a recursive way to deal with this issue?

like image 271
Beginner Avatar asked Feb 13 '15 17:02

Beginner


3 Answers

All of the information is the other answers is correct but because so many people store objects with path-like keys in S3, the API does provide some tools to help you deal with them.

For example, in your case if you wanted to list only the "subdirectories" of root without listing all of the objects below that you would do this:

for key in bucket.list(prefix='root/', delimiter='/'):
    print(key.name)

which should produce the output:

file1
file2
folder1/

You could then do:

for key in bucket.list(prefix='root/folder1/', delimiter='/'):
    print(key.name)

and get:

file3
file4
folder2/

And so forth. You can probably accomplish what you want with this approach.

like image 105
garnaat Avatar answered Oct 06 '22 15:10

garnaat


What I found most difficult to fully grasp about S3 is that it is simply a key/value store and not a disk or other type of file-based store that most people are familiar with. The fact that people refer to keys as folders and values as files helps to lend to the initial confusion of working with it.

Being a key/value store, the keys are simply just identifiers and not actual paths into a directory structure. This means that you don't need to actually create folders before referencing them, so you can simply put an object in a bucket at a location like /path/to/my/object without first having to create the "directory" /path/to/my.

Because S3 is a key/value store, the API for interacting with it is more object & hash based than file based. This means that, whether using Amazon's native API or using boto, functions like s3.bucket.Bucket.list will list all the objects in a bucket and optionally filter on a prefix. If you specify a prefix /foo/bar then everything with that prefix will be listed, including /foo/bar/file, /foo/bar/blargh/file, /foo/bar/1/2/3/file, etc.

So the short answer is that you will need to filter out the results that you don't want from your call to s3.bucket.Bucket.list because functions like s3.bucket.Bucket.list, s3.bucket.Bucket.get_all_keys, etc. are all designed to return all keys under the prefix that you specify as a filter.

like image 31
Bruce P Avatar answered Oct 06 '22 16:10

Bruce P


S3 has no concept of "folders" as may think of. It's a single-level hierarchy where files are stored by key.

If you need to do a single-level listing inside a folder, you'll have to constraint the listing in your code. Something like if key.count('/')==1

like image 36
Tasos Vogiatzoglou Avatar answered Oct 06 '22 17:10

Tasos Vogiatzoglou