I'm having trouble with the Python Azure SDK and haven't found anything both on Stack Overflow and in the Msdn Forums.
I want to use Azure SDKs list_blobs() to get a list of blobs - there are more than 5.000 (which is the max_result).
If I take a look at the code in the SDK itself then I see the following:
def list_blobs(self, container_name, prefix=None, marker=None,
maxresults=None, include=None, delimiter=None):
The description for 'Marker' being:
marker:
Optional. A string value that identifies the portion of the list
to be returned with the next list operation. The operation returns
a marker value within the response body if the list returned was
not complete. The marker value may then be used in a subsequent
call to request the next set of list items. The marker value is
opaque to the client.
My problem is that I'm unaware on how to use the marker to get the next set of 5.000 results. If I try something like this:
blobs = blobservice.list_blobs(target_container, prefix= prefix)
print(blobs.marker)
then the marker is always empty, which I assume is because list_blobs() already parses the blobs out of the response.
But if that is the case then how do I actually use the marker in a meaningful way?
I'm sorry if this is a stupid question but this actually is the first one that I didn't find an answer for, even after searching extensively.
Cheers!
If you wish to get all the blob names in all the containers in a storage account, just do blob_service. list_containers to iterate through each container and list all blobs under each iteration. This is also a useful article on how to use Azure Blob Storage from Python.
Use a hierarchical listingGetBlobsByHierarchy, or the BlobContainerClient. GetBlobsByHierarchyAsync method. The following example lists the blobs in the specified container using a hierarchical listing, with an optional segment size specified, and writes the blob name to the console window.
To organize blobs into virtual directories, use a delimiter character in the blob name. The default delimiter character is a forward slash (/), but you can specify any character as the delimiter. If you name your blobs using a delimiter, then you can choose to list blobs hierarchically.
SDK returns the continuation token in a variable called next_marker
. You should use that to get the next set of blobs. See the code below as an example. Here I'm listing 100 blobs from a container at a time:
from azure import *
from azure.storage import *
blob_service = BlobService(account_name='<accountname>', account_key='<accountkey>')
next_marker = None
while True:
blobs = blob_service.list_blobs('<containername>', maxresults=100, marker=next_marker)
next_marker = blobs.next_marker
print(next_marker)
print(len(blobs))
if next_marker is None:
break
print "done"
P.S. The code above throws an exception on the last iteration. Not sure why. But it should give you an idea.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With