Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use Table.ExecuteQuerySegmentedAsync() with Azure Table Storage

Working with the Azure Storage Client library 2.1, I'm working on making a query of Table storage async. I created this code:

public async Task<List<TAzureTableEntity>> GetByPartitionKey(string partitionKey)
{
    var theQuery = _table.CreateQuery<TAzureTableEntity>()
                         .Where(tEnt => tEnt.PartitionKey == partitionKey);
    TableQuerySegment<TAzureTableEntity> querySegment = null;
    var returnList = new List<TAzureTableEntity>();
    while(querySegment == null || querySegment.ContinuationToken != null)
    {
        querySegment = await theQuery.AsTableQuery()
                                     .ExecuteSegmentedAsync(querySegment != null ?
                                         querySegment.ContinuationToken : null);
        returnList.AddRange(querySegment);
    }
    return returnList;
}

Let's assume there is a large set of data coming back so there will be a lot of round trips to Table Storage. The problem I have is that we're awaiting a set of data, adding it to an in-memory list, awaiting more data, adding it to the same list, awaiting yet more data, adding it to the list... and so on and so forth. Why not just wrap a Task.Factory.StartNew() around a regular TableQuery? Like so:

public async Task<List<TAzureTableEntity>> GetByPartitionKey(string partitionKey)
{
    var returnList = await Task.Factory.StartNew(() =>
                                                 table.CreateQuery<TAzureTableEntity>()
                                                .Where(ent => ent.PartitionKey == partitionKey)
                                                .ToList());
    return returnList;
}

Doing it this way seems like we're not bouncing the SynchronizationContext back and forth so much. Or does it really matter?

Edit to Rephrase Question

What's the difference between the two scenarios mentioned above?

like image 917
Hallmanac Avatar asked Oct 27 '13 23:10

Hallmanac


2 Answers

The difference between the two is that your second version will block a ThreadPool thread for the whole time the query is executing. This might be acceptable in a GUI application (where all you want is to execute the code somewhere other than the UI thread), but it will negate any scalability advantages of async in a server application.

Also, if you don't want your first version to return to the UI context for each roundtrip (which is a reasonable requirement), then use ConfigureAwait(false) whenever you use await:

querySegment = await theQuery.AsTableQuery()
                             .ExecuteSegmentedAsync(…)
                             .ConfigureAwait(false);

This way, all iterations after the first one will (most likely) execute on a ThreadPool thread and not on the UI context.

BTW, in your second version, you don't actually need await at all, you could just directly return the Task:

public Task<List<TAzureTableEntity>> GetByPartitionKey(string partitionKey)
{
    return Task.Run(() => table.CreateQuery<TAzureTableEntity>()
                               .Where(ent => ent.PartitionKey == partitionKey)
                               .ToList());
}
like image 64
svick Avatar answered Sep 20 '22 23:09

svick


Not sure if this is the answer you're looking for but I still want to mention it :).

As you may already know, the 2nd method (using Task) handles continuation tokens internally and comes out of the method when all entities have been fetched whereas the 1st method fetches a set of entities (up to a maximum of 1000) and then comes out giving you the result set as well as a continuation token.

If you're interested in fetching all entities from a table, both methods can be used however the 1st one gives you the flexibility of breaking out of loop gracefully anytime, which you don't get in the 2nd one. So using the 1st function you could essentially introduce pagination concept.

Let's assume you're building a web application which shows data from a table. Further let's assume that the table contains large number of entities (let's say 100000 entities). Using 1st method, you can just fetch 1000 entities return the result back to the user and if the user wants, you can fetch next set of 1000 entities and show them to the user. You could continue doing that till the time user wants and there's data in the table. With the 2nd method the user would have to wait till all 100000 entities are fetched from the table.

like image 21
Gaurav Mantri Avatar answered Sep 18 '22 23:09

Gaurav Mantri