I'm going to write a Python program to check if a file is in certain folder of my Google Cloud Storage, the basic idea is to get the list
of all objects in a folder, a file name list
, then check if the file abc.txt
is in the file name list
.
Now the problem is, it looks Google only provide the one way to get obj
list
, which is uri.get_bucket()
, see below code which is from https://developers.google.com/storage/docs/gspythonlibrary#listing-objects
uri = boto.storage_uri(DOGS_BUCKET, GOOGLE_STORAGE) for obj in uri.get_bucket(): print '%s://%s/%s' % (uri.scheme, uri.bucket_name, obj.name) print ' "%s"' % obj.get_contents_as_string()
The defect of uri.get_bucket()
is, it looks it is getting all of the object first, this is what I don't want, I just need get the obj
name list
of particular folder(e.g gs//mybucket/abc/myfolder
) , which should be much quickly.
Could someone help answer? Appreciate every answer!
List the objects in a bucket. In the Google Cloud console, go to the Cloud Storage Buckets page. In the bucket list, click on the name of the bucket whose contents you want to view.
ls - List providers, buckets, or objects.
After installing and configuring Google Cloud SDK gsutil command can be run by simply typing its name and the argument(-s) using Windows cmd.
Update: the below is true for the older "Google API Client Libraries" for Python, but if you're not using that client, prefer the newer "Google Cloud Client Library" for Python ( https://googleapis.dev/python/storage/latest/index.html ). For the newer library, the equivalent to the below code is:
from google.cloud import storage client = storage.Client() for blob in client.list_blobs('bucketname', prefix='abc/myfolder'): print(str(blob))
Answer for older client follows.
You may find it easier to work with the JSON API, which has a full-featured Python client. It has a function for listing objects that takes a prefix parameter, which you could use to check for a certain directory and its children in this manner:
from apiclient import discovery # Auth goes here if necessary. Create authorized http object... client = discovery.build('storage', 'v1') # add http=whatever param if auth request = client.objects().list( bucket="mybucket", prefix="abc/myfolder") while request is not None: response = request.execute() print json.dumps(response, indent=2) request = request.list_next(request, response)
Fuller documentation of the list call is here: https://developers.google.com/storage/docs/json_api/v1/objects/list
And the Google Python API client is documented here: https://code.google.com/p/google-api-python-client/
This worked for me:
client = storage.Client() BUCKET_NAME = 'DEMO_BUCKET' bucket = client.get_bucket(BUCKET_NAME) blobs = bucket.list_blobs() for blob in blobs: print(blob.name)
The list_blobs() method will return an iterator used to find blobs in the bucket. Now you can iterate over blobs and access every object in the bucket. In this example I just print out the name of the object.
This documentation helped me alot:
https://googleapis.github.io/google-cloud-python/latest/storage/blobs.html
https://googleapis.github.io/google-cloud-python/latest/_modules/google/cloud/storage/client.html#Client.bucket
I hope I could help!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With