Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to access files within subfolders of a bucket GCS using Python?

from google.cloud import storage
import os
bucket = client.get_bucket('path to bucket')

The above code connects me to my bucket but I am struggling to connect with a specific folder within the bucket.

I am trying variants of this code, but no luck:

blob = bucket.get_blob("training/bad")
blob = bucket.get_blob("/training/bad")
blob = bucket.get_blob("path to bucket/training/bad")

I am hoping to get access to a list of images within the bad subfolder, but I can't seem to do so. I don't even fully understand what a blob is despite reading the docs, and sort of winging it based on tutorials.

Thank you.

like image 876
Moondra Avatar asked Feb 18 '19 01:02

Moondra


People also ask

How do I view GCS files?

In the Cloud Storage pane, search for and then select your project. In the list of buckets in your project, double-click a bucket to see its contents. If your bucket contains folders, to display a folder's contents, double-click the folder.

How do I browse a directory in Python?

Getting a Directory Listing. The built-in os module has a number of useful functions that can be used to list directory contents and filter the results. To get a list of all the files and folders in a particular directory in the filesystem, use os. listdir() in legacy versions of Python or os.


2 Answers

What you missed is the fact that in GCS objects in a bucket aren't organized in a filesystem-like directory structure/hierarchy, but rather in a flat structure.

A more detailed explanation can be found in How Subdirectories Work (in the gsutil context, true, but the fundamental reason is the same - the GCS flat namespace):

gsutil provides the illusion of a hierarchical file tree atop the "flat" name space supported by the Google Cloud Storage service. To the service, the object gs://your-bucket/abc/def.txt is just an object that happens to have "/" characters in its name. There is no "abc" directory; just a single object with the given name.

Since there are no (sub)directories in GCS then /training/bad doesn't really exist, so you can't list its content. All you can do is list all the objects in the bucket and select the ones with names/paths that start with /training/bad.

like image 105
Dan Cornilescu Avatar answered Oct 17 '22 14:10

Dan Cornilescu


If you would like to find blobs (files) that exist under a specific prefix (subdirectory) you can specify prefix and delimiter arguments to the list_blobs() function

See the following example taken from the Google Listing Objects example (also GitHub snippet)

def list_blobs_with_prefix(bucket_name, prefix, delimiter=None):
    """Lists all the blobs in the bucket that begin with the prefix.

    This can be used to list all blobs in a "folder", e.g. "public/".

    The delimiter argument can be used to restrict the results to only the
    "files" in the given "folder". Without the delimiter, the entire tree under
    the prefix is returned. For example, given these blobs:

        /a/1.txt
        /a/b/2.txt

    If you just specify prefix = '/a', you'll get back:

        /a/1.txt
        /a/b/2.txt

    However, if you specify prefix='/a' and delimiter='/', you'll get back:

        /a/1.txt

    """
    storage_client = storage.Client()
    bucket = storage_client.get_bucket(bucket_name)

    blobs = bucket.list_blobs(prefix=prefix, delimiter=delimiter)

    print('Blobs:')
    for blob in blobs:
        print(blob.name)

    if delimiter:
        print('Prefixes:')
        for prefix in blobs.prefixes:
            print(prefix)
like image 39
ScottMcC Avatar answered Oct 17 '22 15:10

ScottMcC