Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get list of folders in a given bucket using Google Cloud API

I wanted to get all the folders inside a given Google Cloud bucket or folder using Google Cloud Storage API.

For example if gs://abc/xyz contains three folders gs://abc/xyz/x1, gs://abc/xyz/x2 and gs://abc/xyz/x3. The API should return all three folder in gs://abc/xyz.

It can easily be done using gsutil

gsutil ls gs://abc/xyz

But I need to do it using python and Google Cloud Storage API.

like image 549
Shamshad Alam Avatar asked May 06 '16 14:05

Shamshad Alam


People also ask

How do I list files in a bucket GCP?

In the Google Cloud console, go to the Cloud Storage Buckets page. In the bucket list, click on the name of the bucket whose contents you want to view.

What command is used to show a list of Cloud Storage buckets?

Gsutil is the command line tool used to manage buckets and objects on Google Storage.

How do I access a bucket in Google Cloud Shell?

In the Cloud Storage buckets page, click the name of the bucket that you created. In the Objects tab, click Upload files. In the file dialog, go to the file that you downloaded and select it.


1 Answers

This question is about listing the folders inside a bucket/folder. None of the suggestions worked for me and after experimenting with the google.cloud.storage SDK, I suspect it is not possible (as of November 2019) to list the sub-directories of any path in a bucket. It is possible with the REST API, so I wrote this little wrapper...

from google.api_core import page_iterator
from google.cloud import storage

def _item_to_value(iterator, item):
    return item

def list_directories(bucket_name, prefix):
    if prefix and not prefix.endswith('/'):
        prefix += '/'

    extra_params = {
        "projection": "noAcl",
        "prefix": prefix,
        "delimiter": '/'
    }

    gcs = storage.Client()

    path = "/b/" + bucket_name + "/o"

    iterator = page_iterator.HTTPIterator(
        client=gcs,
        api_request=gcs._connection.api_request,
        path=path,
        items_key='prefixes',
        item_to_value=_item_to_value,
        extra_params=extra_params,
    )

    return [x for x in iterator]

For example, if you have my-bucket containing:

  • dog-bark
    • datasets
      • v1
      • v2

Then calling list_directories('my-bucket', 'dog-bark/datasets') will return:

['dog-bark/datasets/v1', 'dog-bark/datasets/v2']

like image 59
Antony Harfield Avatar answered Oct 05 '22 01:10

Antony Harfield