How to access files within subfolders of a bucket GCS using Python?

Tags:

from google.cloud import storage
import os
bucket = client.get_bucket('path to bucket')

The above code connects me to my bucket but I am struggling to connect with a specific folder within the bucket.

I am trying variants of this code, but no luck:

blob = bucket.get_blob("training/bad")
blob = bucket.get_blob("/training/bad")
blob = bucket.get_blob("path to bucket/training/bad")

I am hoping to get access to a list of images within the bad subfolder, but I can't seem to do so. I don't even fully understand what a blob is despite reading the docs, and sort of winging it based on tutorials.

Thank you.

876

asked Feb 18 '19 01:02

Moondra

2 Answers

What you missed is the fact that in GCS objects in a bucket aren't organized in a filesystem-like directory structure/hierarchy, but rather in a flat structure.

A more detailed explanation can be found in How Subdirectories Work (in the gsutil context, true, but the fundamental reason is the same - the GCS flat namespace):

gsutil provides the illusion of a hierarchical file tree atop the "flat" name space supported by the Google Cloud Storage service. To the service, the object gs://your-bucket/abc/def.txt is just an object that happens to have "/" characters in its name. There is no "abc" directory; just a single object with the given name.

Since there are no (sub)directories in GCS then /training/bad doesn't really exist, so you can't list its content. All you can do is list all the objects in the bucket and select the ones with names/paths that start with /training/bad.

105

answered Oct 17 '22 14:10

Dan Cornilescu

If you would like to find blobs (files) that exist under a specific prefix (subdirectory) you can specify prefix and delimiter arguments to the list_blobs() function

See the following example taken from the Google Listing Objects example (also GitHub snippet)

def list_blobs_with_prefix(bucket_name, prefix, delimiter=None):
    """Lists all the blobs in the bucket that begin with the prefix.

    This can be used to list all blobs in a "folder", e.g. "public/".

    The delimiter argument can be used to restrict the results to only the
    "files" in the given "folder". Without the delimiter, the entire tree under
    the prefix is returned. For example, given these blobs:

        /a/1.txt
        /a/b/2.txt

    If you just specify prefix = '/a', you'll get back:

        /a/1.txt
        /a/b/2.txt

    However, if you specify prefix='/a' and delimiter='/', you'll get back:

        /a/1.txt

    """
    storage_client = storage.Client()
    bucket = storage_client.get_bucket(bucket_name)

    blobs = bucket.list_blobs(prefix=prefix, delimiter=delimiter)

    print('Blobs:')
    for blob in blobs:
        print(blob.name)

    if delimiter:
        print('Prefixes:')
        for prefix in blobs.prefixes:
            print(prefix)

answered Oct 17 '22 15:10

ScottMcC

Related questions
                            
                                Splitting Thai text by characters
                            
                                Set vs. set python
                            
                                Iterating through array
                            
                                How to compare individual characters in two strings in Python 3
                            
                                pyQt: How do I update a label?
                            
                                Network capturing with Selenium/PhantomJS
                            
                                Custom Python gTTS voice
                            
                                python3: UTF-8 encoding in http.server
                            
                                python getattr() with multiple params
                            
                                Python list comprehension with dummy names identical to iterator name: ill-advised?
                            
                                Convert ascii string to base64 without the "b" and quotation marks
                            
                                Python Pandas Fillna Median not working
                            
                                Flatten a list of elements in Pandas DataFrame
                            
                                doing "nothing" in else command of if-else clause [duplicate]
                            
                                adding static() to urlpatterns only work by appending to the list
                            
                                Pytorch - Stack dimension must be exactly the same?
                            
                                Unable to print names in the right way in another function
                            
                                Merging multiple CSV files into separate tabs of a spreadsheet in Python
                            
                                Reading Data From Cloud Storage Via Cloud Functions
                            
                                Joining on datetime64[ns, UTC] fails using pandas.join

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to access files within subfolders of a bucket GCS using Python?

Tags:

python-3.x

google-cloud-platform

google-cloud-storage

Moondra

People also ask

2 Answers

Dan Cornilescu

ScottMcC

Recent Activity

Donate For Us