I wanted to get all the folders inside a given Google Cloud bucket or folder using Google Cloud Storage API.
For example if gs://abc/xyz
contains three folders gs://abc/xyz/x1
, gs://abc/xyz/x2
and gs://abc/xyz/x3
. The API should return all three folder in gs://abc/xyz
.
It can easily be done using gsutil
gsutil ls gs://abc/xyz
But I need to do it using python and Google Cloud Storage API.
In the Google Cloud console, go to the Cloud Storage Buckets page. In the bucket list, click on the name of the bucket whose contents you want to view.
Gsutil is the command line tool used to manage buckets and objects on Google Storage.
In the Cloud Storage buckets page, click the name of the bucket that you created. In the Objects tab, click Upload files. In the file dialog, go to the file that you downloaded and select it.
This question is about listing the folders inside a bucket/folder. None of the suggestions worked for me and after experimenting with the google.cloud.storage
SDK, I suspect it is not possible (as of November 2019) to list the sub-directories of any path in a bucket. It is possible with the REST API, so I wrote this little wrapper...
from google.api_core import page_iterator
from google.cloud import storage
def _item_to_value(iterator, item):
return item
def list_directories(bucket_name, prefix):
if prefix and not prefix.endswith('/'):
prefix += '/'
extra_params = {
"projection": "noAcl",
"prefix": prefix,
"delimiter": '/'
}
gcs = storage.Client()
path = "/b/" + bucket_name + "/o"
iterator = page_iterator.HTTPIterator(
client=gcs,
api_request=gcs._connection.api_request,
path=path,
items_key='prefixes',
item_to_value=_item_to_value,
extra_params=extra_params,
)
return [x for x in iterator]
For example, if you have my-bucket
containing:
Then calling list_directories('my-bucket', 'dog-bark/datasets')
will return:
['dog-bark/datasets/v1', 'dog-bark/datasets/v2']
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With