Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Google Cloud Storage + Python : Any way to list obj in certain folder in GCS?

Tags:

I'm going to write a Python program to check if a file is in certain folder of my Google Cloud Storage, the basic idea is to get the list of all objects in a folder, a file name list, then check if the file abc.txt is in the file name list.

Now the problem is, it looks Google only provide the one way to get obj list, which is uri.get_bucket(), see below code which is from https://developers.google.com/storage/docs/gspythonlibrary#listing-objects

uri = boto.storage_uri(DOGS_BUCKET, GOOGLE_STORAGE) for obj in uri.get_bucket():     print '%s://%s/%s' % (uri.scheme, uri.bucket_name, obj.name)     print '  "%s"' % obj.get_contents_as_string() 

The defect of uri.get_bucket() is, it looks it is getting all of the object first, this is what I don't want, I just need get the obj name list of particular folder(e.g gs//mybucket/abc/myfolder) , which should be much quickly.

Could someone help answer? Appreciate every answer!

like image 475
Reed_Xia Avatar asked Mar 14 '14 07:03

Reed_Xia


People also ask

How do I get a list of files in a GCS bucket?

List the objects in a bucket. In the Google Cloud console, go to the Cloud Storage Buckets page. In the bucket list, click on the name of the bucket whose contents you want to view.

Which command is used to see the list of buckets in GCP?

ls - List providers, buckets, or objects.

How do I use gsutil in Python?

After installing and configuring Google Cloud SDK gsutil command can be run by simply typing its name and the argument(-s) using Windows cmd.


2 Answers

Update: the below is true for the older "Google API Client Libraries" for Python, but if you're not using that client, prefer the newer "Google Cloud Client Library" for Python ( https://googleapis.dev/python/storage/latest/index.html ). For the newer library, the equivalent to the below code is:

from google.cloud import storage  client = storage.Client() for blob in client.list_blobs('bucketname', prefix='abc/myfolder'):   print(str(blob)) 

Answer for older client follows.

You may find it easier to work with the JSON API, which has a full-featured Python client. It has a function for listing objects that takes a prefix parameter, which you could use to check for a certain directory and its children in this manner:

from apiclient import discovery  # Auth goes here if necessary. Create authorized http object... client = discovery.build('storage', 'v1') # add http=whatever param if auth request = client.objects().list(     bucket="mybucket",     prefix="abc/myfolder") while request is not None:   response = request.execute()   print json.dumps(response, indent=2)   request = request.list_next(request, response) 

Fuller documentation of the list call is here: https://developers.google.com/storage/docs/json_api/v1/objects/list

And the Google Python API client is documented here: https://code.google.com/p/google-api-python-client/

like image 126
Brandon Yarbrough Avatar answered Sep 16 '22 13:09

Brandon Yarbrough


This worked for me:

client = storage.Client() BUCKET_NAME = 'DEMO_BUCKET' bucket = client.get_bucket(BUCKET_NAME)  blobs = bucket.list_blobs()  for blob in blobs:     print(blob.name) 

The list_blobs() method will return an iterator used to find blobs in the bucket. Now you can iterate over blobs and access every object in the bucket. In this example I just print out the name of the object.

This documentation helped me alot:

  • https://googleapis.github.io/google-cloud-python/latest/storage/blobs.html

  • https://googleapis.github.io/google-cloud-python/latest/_modules/google/cloud/storage/client.html#Client.bucket

I hope I could help!

like image 24
Sharif Elfouly Avatar answered Sep 17 '22 13:09

Sharif Elfouly