I want to get a list of all the blobs in a Google Cloud Storage bucket using the Client Library for Python. According to the documentation I should use the <code>list_blobs()</code> function. The function appears to use two arguments <code>max_results</code> and <code>page_token</code> to achieve paging. I am not sure how use them. In particular, where do I get the <code>page_token</code> from? I would have expected that <code>list_blobs()</code> would provide a <code>page_token</code> for use in subsequent calls, but I cannot find any documentation on it. In addition, <code>max_results</code> is optional. What happens if I don't provide it? Is there a default limit? If so, what is it?

<code>list_blobs()</code> does use paging, but you do not use <code>page_token</code> to achieve it. <h3>How It Works:</h3> The way <code>list_blobs()</code> work is that it returns an iterator that iterates through all the results doing paging behind the scenes. So simply doing this will get you through all the results, fetching pages as needed: <pre class="prettyprint"><code>for blob in bucket.list_blobs() print blob.name </code></pre> <h3>The Documentation is Wrong/Misleading:</h3> As of 04/26/2017 this is what the docs says: <blockquote> <code>page_token</code> (str) – (Optional) Opaque marker for the next “page” of blobs. If not passed, will return the first page of blobs. </blockquote> This implies that the result will be a single page of results with <code>page_token</code> determining which page. This is not correct. The result iterator iterates through multiple pages. What <code>page_token</code> actually represents is which page the iterator should START at. It no <code>page_token</code> is provided it will start at the first page. <h3>Helpful To Know:</h3> <code>max_results</code> limits the total number of results returned by the iterator. The iterator does expose pages if you need it: <pre class="prettyprint"><code>for page in bucket.list_blobs().pages: for blob in page: print blob.name </code></pre>

How does paging work in the list_blobs function in Google Cloud Storage Python Client Library

2 Answers

list_blobs() does use paging, but you do not use page_token to achieve it.

How It Works:

The way list_blobs() work is that it returns an iterator that iterates through all the results doing paging behind the scenes. So simply doing this will get you through all the results, fetching pages as needed:

for blob in bucket.list_blobs()
    print blob.name

The Documentation is Wrong/Misleading:

As of 04/26/2017 this is what the docs says:

page_token (str) – (Optional) Opaque marker for the next “page” of blobs. If not passed, will return the first page of blobs.

This implies that the result will be a single page of results with page_token determining which page. This is not correct. The result iterator iterates through multiple pages. What page_token actually represents is which page the iterator should START at. It no page_token is provided it will start at the first page.

Helpful To Know:

max_results limits the total number of results returned by the iterator.

The iterator does expose pages if you need it:

for page in bucket.list_blobs().pages:
    for blob in page:
        print blob.name

153

answered Sep 19 '22 00:09

user2771609

Please read the inline comments:

from google.cloud import storage

storage = storage.Client()

bucket_name = ''  # Fill here your bucket name

# This will limit number of results - replace this with None in order to get all the blobs in the bucket
max_results = 23_344 

# Please specify the "nextPageToken" in order to trigger an implicit pagination 
# (which is managed for you by the library).
# Moreover, you'll need to specify the "items" with all the fields you would like to fetch.
# Here are the supported fields: https://cloud.google.com/storage/docs/json_api/v1/objects#resource

fields = 'items(name),nextPageToken'

counter = 0
for blob in storage.list_blobs(bucket_name, fields=fields, max_results=max_results):
    counter += 1
    print(counter, ')', blob.name)

answered Sep 20 '22 00:09

Victor Klapholz

Related questions
                            
                                How can I make a barplot and a lineplot in the same seaborn plot with different Y axes nicely?
                            
                                what exactly is .bash_profile.pysave?
                            
                                Can not infer schema for type: <type 'str'>
                            
                                Finding elements with selenium using Starts with and ends functions in xpath
                            
                                python add value in dictionary in lambda expression
                            
                                Alternating row color using xlsxwriter in Python 3
                            
                                zip the values from a dictionary [duplicate]
                            
                                Python :unit test throws <Response streamed [200 OK]> instead of actual output
                            
                                django.db.migrations.exceptions.CircularDependencyError
                            
                                Split output of a layer in keras
                            
                                Convert API to Pandas DataFrame
                            
                                Why don't f-strings change when variables they reference change?
                            
                                Outer product of each column of a 2D array to form a 3D array - NumPy
                            
                                What do the functions tf.squeeze and tf.nn.rnn do?
                            
                                Environment specific pip.conf under anaconda
                            
                                Hiding and showing a widget in Kivy
                            
                                How do I have a "press enter to continue" feature in python? [duplicate]
                            
                                sqlalchemy print results instead of objects
                            
                                pip install mod_wsgi, How to Set MOD_WSGI_APACHE_ROOTDIR environment?
                            
                                ImportError: No module named googleapiclient.discovery

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How does paging work in the list_blobs function in Google Cloud Storage Python Client Library

Tags:

python

google-app-engine

google-cloud-platform

google-cloud-storage

user2771609

People also ask

2 Answers

How It Works:

The Documentation is Wrong/Misleading:

Helpful To Know:

user2771609

Victor Klapholz

Recent Activity

Donate For Us