I need to get the number of files in a bucket of GCS.
I don't want to use list_blobs to read them one by one and increase a counter.
Is there something like a metadata we can query?
I need to download all the files in the bucket and process them. now I want to do it using threads so I would need to separate files to groups somehow.
The idea was to use list_blobs with offset and size, but in order to do that I need to know the number of total files.
Any idea?
Thanks
I know the original question did not want to use .list_blobs() to count the number of files in a bucket, but since I didn't find a different way, I'm posting it here for reference, since it does work:
from google.cloud import storage
storage_client = storage.Client()
blobs_list = storage_client.list_blobs(bucket_or_name='name_of_your_bucket')
print(sum(1 for _ in blobs_list))
.list_blobs() returns an iterator, so this answer basically loops over the iterator and counts the elements.
If you only want to count the files within a certain folder in your bucket, you can use the prefix keyword:
blobs_list = storage_client.list_blobs(
bucket_or_name='name_of_your_bucket',
prefix='name_of_your_folder',
)
FYI: this question suggests a different method to solve this:
How can I get number of files from gs bucket using python
There's no way to do a single metadata query to get the count. You could run a command like:
gsutil ls gs://my-bucket/** | wc -l
but note that this command is making a number of bucket listing requests behind the scenes - which can take a long time if the bucket is large, and will cost based on the number of operations it makes.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With