Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a way to get number of objects in a google cloud storage bucket using python?

I need to get the number of files in a bucket of GCS. I don't want to use list_blobs to read them one by one and increase a counter. Is there something like a metadata we can query?

I need to download all the files in the bucket and process them. now I want to do it using threads so I would need to separate files to groups somehow. The idea was to use list_blobs with offset and size, but in order to do that I need to know the number of total files.

Any idea?

Thanks

like image 845
Ido Barash Avatar asked Nov 24 '25 19:11

Ido Barash


2 Answers

I know the original question did not want to use .list_blobs() to count the number of files in a bucket, but since I didn't find a different way, I'm posting it here for reference, since it does work:

from google.cloud import storage

storage_client = storage.Client()
blobs_list = storage_client.list_blobs(bucket_or_name='name_of_your_bucket')

print(sum(1 for _ in blobs_list))

.list_blobs() returns an iterator, so this answer basically loops over the iterator and counts the elements.

If you only want to count the files within a certain folder in your bucket, you can use the prefix keyword:

blobs_list = storage_client.list_blobs(
    bucket_or_name='name_of_your_bucket',
    prefix='name_of_your_folder',
)

FYI: this question suggests a different method to solve this:
How can I get number of files from gs bucket using python

like image 171
Sander van den Oord Avatar answered Nov 27 '25 09:11

Sander van den Oord


There's no way to do a single metadata query to get the count. You could run a command like:

gsutil ls gs://my-bucket/** | wc -l

but note that this command is making a number of bucket listing requests behind the scenes - which can take a long time if the bucket is large, and will cost based on the number of operations it makes.

like image 26
Mike Schwartz Avatar answered Nov 27 '25 08:11

Mike Schwartz