from google.cloud import storage
import os
bucket = client.get_bucket('path to bucket')
The above code connects me to my bucket but I am struggling to connect with a specific folder within the bucket.
I am trying variants of this code, but no luck:
blob = bucket.get_blob("training/bad")
blob = bucket.get_blob("/training/bad")
blob = bucket.get_blob("path to bucket/training/bad")
I am hoping to get access to a list of images within the bad subfolder, but I can't seem to do so. I don't even fully understand what a blob is despite reading the docs, and sort of winging it based on tutorials.
Thank you.
In the Cloud Storage pane, search for and then select your project. In the list of buckets in your project, double-click a bucket to see its contents. If your bucket contains folders, to display a folder's contents, double-click the folder.
Getting a Directory Listing. The built-in os module has a number of useful functions that can be used to list directory contents and filter the results. To get a list of all the files and folders in a particular directory in the filesystem, use os. listdir() in legacy versions of Python or os.
What you missed is the fact that in GCS objects in a bucket aren't organized in a filesystem-like directory structure/hierarchy, but rather in a flat structure.
A more detailed explanation can be found in How Subdirectories Work (in the gsutil
context, true, but the fundamental reason is the same - the GCS flat namespace):
gsutil provides the illusion of a hierarchical file tree atop the "flat" name space supported by the Google Cloud Storage service. To the service, the object gs://your-bucket/abc/def.txt is just an object that happens to have "/" characters in its name. There is no "abc" directory; just a single object with the given name.
Since there are no (sub)directories in GCS then /training/bad
doesn't really exist, so you can't list its content. All you can do is list all the objects in the bucket and select the ones with names/paths that start with /training/bad
.
If you would like to find blobs (files) that exist under a specific prefix (subdirectory) you can specify prefix
and delimiter
arguments to the list_blobs()
function
See the following example taken from the Google Listing Objects example (also GitHub snippet)
def list_blobs_with_prefix(bucket_name, prefix, delimiter=None):
"""Lists all the blobs in the bucket that begin with the prefix.
This can be used to list all blobs in a "folder", e.g. "public/".
The delimiter argument can be used to restrict the results to only the
"files" in the given "folder". Without the delimiter, the entire tree under
the prefix is returned. For example, given these blobs:
/a/1.txt
/a/b/2.txt
If you just specify prefix = '/a', you'll get back:
/a/1.txt
/a/b/2.txt
However, if you specify prefix='/a' and delimiter='/', you'll get back:
/a/1.txt
"""
storage_client = storage.Client()
bucket = storage_client.get_bucket(bucket_name)
blobs = bucket.list_blobs(prefix=prefix, delimiter=delimiter)
print('Blobs:')
for blob in blobs:
print(blob.name)
if delimiter:
print('Prefixes:')
for prefix in blobs.prefixes:
print(prefix)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With