Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Azure SDK: Using list_blobs to get more than 5.000 Results

I'm having trouble with the Python Azure SDK and haven't found anything both on Stack Overflow and in the Msdn Forums.

I want to use Azure SDKs list_blobs() to get a list of blobs - there are more than 5.000 (which is the max_result).

If I take a look at the code in the SDK itself then I see the following:

    def list_blobs(self, container_name, prefix=None, marker=None,
                   maxresults=None, include=None, delimiter=None):

The description for 'Marker' being:

    marker:
        Optional. A string value that identifies the portion of the list
        to be returned with the next list operation. The operation returns
        a marker value within the response body if the list returned was
        not complete. The marker value may then be used in a subsequent
        call to request the next set of list items. The marker value is
        opaque to the client.

My problem is that I'm unaware on how to use the marker to get the next set of 5.000 results. If I try something like this:

    blobs = blobservice.list_blobs(target_container, prefix= prefix)            
    print(blobs.marker)

then the marker is always empty, which I assume is because list_blobs() already parses the blobs out of the response.

But if that is the case then how do I actually use the marker in a meaningful way?

I'm sorry if this is a stupid question but this actually is the first one that I didn't find an answer for, even after searching extensively.

Cheers!

like image 288
user3755680 Avatar asked Jun 19 '14 08:06

user3755680


People also ask

How do you list all blobs in a container in Python?

If you wish to get all the blob names in all the containers in a storage account, just do blob_service. list_containers to iterate through each container and list all blobs under each iteration. This is also a useful article on how to use Azure Blob Storage from Python.

How do I list all blobs in a container?

Use a hierarchical listingGetBlobsByHierarchy, or the BlobContainerClient. GetBlobsByHierarchyAsync method. The following example lists the blobs in the specified container using a hierarchical listing, with an optional segment size specified, and writes the blob name to the console window.

How to list blobs Azure?

To organize blobs into virtual directories, use a delimiter character in the blob name. The default delimiter character is a forward slash (/), but you can specify any character as the delimiter. If you name your blobs using a delimiter, then you can choose to list blobs hierarchically.


1 Answers

SDK returns the continuation token in a variable called next_marker. You should use that to get the next set of blobs. See the code below as an example. Here I'm listing 100 blobs from a container at a time:

from azure import *
from azure.storage import *

blob_service = BlobService(account_name='<accountname>', account_key='<accountkey>')
next_marker = None
while True:
    blobs = blob_service.list_blobs('<containername>', maxresults=100, marker=next_marker)
    next_marker = blobs.next_marker
    print(next_marker)
    print(len(blobs))
    if next_marker is None:
        break
print "done"

P.S. The code above throws an exception on the last iteration. Not sure why. But it should give you an idea.

like image 60
Gaurav Mantri Avatar answered Sep 28 '22 21:09

Gaurav Mantri