Reading line by line from blob Storage in Windows Azure

2 Answers

Yes, you can do this with streams, and it doesn't necessarily require that you pull the entire file, though please read to the end (of the answer... not the file in question) because you may want to pull the whole file anyway.

Here is the code:

StorageCredentialsAccountAndKey credentials = new StorageCredentialsAccountAndKey(
    "YourStorageAccountName",
    "YourStorageAccountKey"
);
CloudStorageAccount account = new CloudStorageAccount(credentials, true);
CloudBlobClient client = new CloudBlobClient(account.BlobEndpoint.AbsoluteUri, account.Credentials);
CloudBlobContainer container = client.GetContainerReference("test");

CloudBlob blob = container.GetBlobReference("CloudBlob.txt");
using (var stream = blob.OpenRead())
{
    using (StreamReader reader = new StreamReader(stream))
    {
        while (!reader.EndOfStream)
        {
            Console.WriteLine(reader.ReadLine());
        }
    }
}

I uploaded a text file called CloudBlob.txt to a container called test. The file was about 1.37 MB in size (I actually used the CloudBlob.cs file from GitHub copied into the same file six or seven times). I tried this out with a BlockBlob which is likely what you'll be dealing with since you are talking about a text file.

This gets a reference to the BLOB as usualy, then I call the OpenRead() method off the CloudBlob object which returns you a BlobStream that you can then wrap in a StreamReader to get you the ReadLine method. I ran fiddler with this and noticed that it ended up calling up to get additional blocks three times to complete the file. It looks like the BlobStream has a few properties and such you can use to tweak the amount of reading ahead you have to do, but I didn't try adjusting them. According to one reference I found the retry policy also works at the last read level, so it won't attempt to re-read the whole thing again, just the last request that failed. Quoted here:

Lastly, the DownloadToFile/ByteArray/Stream/Text() methods performs it’s entire download in a single streaming get. If you use CloudBlob.OpenRead() method it will utilize the BlobReadStream abstraction which will download the blob one block at a time as it is consumed. If a connection error occurs, then only that one block will need to be re-downloaded(according to the configured RetryPolicy). Also, this will potentially help improve performance as the client may not need cache a large amount of data locally. For large blobs this can help significantly, however be aware that you will be performing a higher number of overall transactions against the service. -- Joe Giardino

I think it is important to note the caution that Joe points out in that this will lead to an overall larger number of transactions against your storage account. However, depending on your requirements this may still be the option you are looking for.

If these are massive files and you are doing a lot of this then it could many, many transactions (though you could see if you can tweak the properties on the BlobStream to increase the amount of blocks retrieved at a time, etc). It may still make sense to do a DownloadFromStream on the CloudBlob (which will pull the entire contents down), then read from that stream the same way I did above.

The only real difference is that one is pulling smaller chunks at a time and the other is pulling the full file immediately. There are pros and cons for each and it will depend heavily on how large these files are and if you plan on stopping at some point in the middle of reading the file (such as "yeah, I found the string I was searching for!) or if you plan on reading the entire file anyway. If you plan on pulling the whole file no matter what (because you are processing the entire file for example), then just use the DownloadToStream and wrap that in a StreamReader.

Note: I tried this with the 1.7 SDK. I'm not sure which SDK these options were introduced.

100

answered Sep 20 '22 15:09

MikeWo

To directly answer your question, you will have to write code to download the blob locally first and then read the content in it. This is mainly because you can not just peak into a blob and read its content in middle. IF you have used Windows Azure Table Storage, you sure can read the specific content in the table.

As your text file is a blob and located at the Azure Blob storage, what you really need is to download the blob locally (as local blob or memory stream) and then read the content in it. You will have to download the blob full or partial depend on what type of blob you have uploaded. With Page blobs you can download specific size of content locally and process it. It would be great to know about difference between block and page blob on this regard.

answered Sep 22 '22 15:09

AvkashChauhan

Related questions
                            
                                Getting MySQL path in command prompt
                            
                                Using anaconda environments with cygwin on windows
                            
                                How to install graphviz-2.38 on windows 10
                            
                                How to check if a file is not empty in Batch
                            
                                Location of VS Code preferences
                            
                                Rename all files in a directory with a Windows batch script
                            
                                How to install and run lessc on top of node.js and Windows?
                            
                                python not recognized in Windows CMD even after adding to PATH
                            
                                How do I detect when a directory or file changes without constant scanning
                            
                                psql: FATAL: password authentication failed for user windows 8
                            
                                Detect Antivirus on Windows using C# [closed]
                            
                                How do I find full path to an application in a batch script
                            
                                How do I pass an absolute path to the adb command via git bash for windows?
                            
                                Go to line in Atom editor from command line
                            
                                Setting up SSH keys for Bitbucket on Windows
                            
                                (null) entry in command string exception in saveAsTextFile() on Pyspark
                            
                                Creating a win32 modal window with CreateWindow
                            
                                How can I read piped input in Perl on Windows?
                            
                                What is the best way to use linux utilities under windows? [closed]
                            
                                Run Python Script on Selected File

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Reading line by line from blob Storage in Windows Azure

Tags:

windows

azure

Eman Aldhahri

People also ask

2 Answers

MikeWo

AvkashChauhan

Recent Activity

Donate For Us