how to speed up a query with partitionkey in azure table storage

Tags:

How do we increase the speed of this query?

We have approximately 100 consumers within the span of 1-2 minutes executing the following query. Each one of these runs represents 1 run of a consumption function.

        TableQuery<T> treanslationsQuery = new TableQuery<T>()
         .Where(
          TableQuery.CombineFilters(
            TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, sourceDestinationPartitionKey)
           , TableOperators.Or,
            TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, anySourceDestinationPartitionKey)
          )
         );

This query will yield approximately 5000 results.

Full code:

    public static async Task<IEnumerable<T>> ExecuteQueryAsync<T>(this CloudTable table, TableQuery<T> query) where T : ITableEntity, new()
    {
        var items = new List<T>();
        TableContinuationToken token = null;

        do
        {
            TableQuerySegment<T> seg = await table.ExecuteQuerySegmentedAsync(query, token);
            token = seg.ContinuationToken;
            items.AddRange(seg);
        } while (token != null);

        return items;
    }

    public static IEnumerable<Translation> Get<T>(string sourceParty, string destinationParty, string wildcardSourceParty, string tableName) where T : ITableEntity, new()
    {
        var acc = CloudStorageAccount.Parse(Environment.GetEnvironmentVariable("conn"));
        var tableClient = acc.CreateCloudTableClient();
        var table = tableClient.GetTableReference(Environment.GetEnvironmentVariable("TableCache"));
        var sourceDestinationPartitionKey = $"{sourceParty.ToLowerTrim()}-{destinationParty.ToLowerTrim()}";
        var anySourceDestinationPartitionKey = $"{wildcardSourceParty}-{destinationParty.ToLowerTrim()}";

        TableQuery<T> treanslationsQuery = new TableQuery<T>()
         .Where(
          TableQuery.CombineFilters(
            TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, sourceDestinationPartitionKey)
           , TableOperators.Or,
            TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, anySourceDestinationPartitionKey)
          )
         );

        var over1000Results = table.ExecuteQueryAsync(treanslationsQuery).Result.Cast<Translation>();
        return over1000Results.Where(x => x.expireAt > DateTime.Now)
                           .Where(x => x.effectiveAt < DateTime.Now);
    }

During these executions, when there are 100 consumers, as you can see the requests will cluster and form spikes:

enter image description here

During these spikes, the requests often take over 1 minute:

enter image description here

How do we increase the speed of this query?

427

asked Nov 04 '19 21:11

Alex Gordon

4 Answers

There is 3 things you can consider:

1. First of all, get rid of your Where clauses that you perform on the query result. It's better to include clauses in query as much as possible (even better if you have any indexes on your tables include them too). For now, you can change your query as below:

var translationsQuery = new TableQuery<T>()
.Where(TableQuery.CombineFilters(
TableQuery.CombineFilters(
    TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, sourceDestinationPartitionKey),
    TableOperators.Or,
    TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, anySourceDestinationPartitionKey)
    ),
TableOperators.And,
TableQuery.CombineFilters(
    TableQuery.GenerateFilterConditionForDate("affectiveAt", QueryComparisons.LessThan, DateTime.Now),
    TableOperators.And,
    TableQuery.GenerateFilterConditionForDate("expireAt", QueryComparisons.GreaterThan, DateTime.Now))
));

Because you have a big amount of data to retrieve it's better to run your queries in parallel. So, you should replace do while loop inside ExecuteQueryAsync method with Parallel.ForEach I wrote based on Stephen Toub Parallel.While; This way it will reduce query execution time. This is a good choice because you can remove Result when you make a call on this method, But it has a little limitation that I'll talk about it after this part of code:

public static IEnumerable<T> ExecuteQueryAsync<T>(this CloudTable table, TableQuery<T> query) where T : ITableEntity, new()
{
    var items = new List<T>();
    TableContinuationToken token = null;

    Parallel.ForEach(new InfinitePartitioner(), (ignored, loopState) =>
    {
        TableQuerySegment<T> seg = table.ExecuteQuerySegmented(query, token);
        token = seg.ContinuationToken;
        items.AddRange(seg);

        if (token == null) // It's better to change this constraint by looking at https://www.vivien-chevallier.com/Articles/executing-an-async-query-with-azure-table-storage-and-retrieve-all-the-results-in-a-single-operation
            loopState.Stop();
    });

    return items;
}

And then you can call it in your Get method:

return table.ExecuteQueryAsync(translationsQuery).Cast<Translation>();

As you can see the method itselft is not async (you should change it's name) and Parallel.ForEach is not compatible with passing in an async method. This is why I've used ExecuteQuerySegmented instead. But, to make it more performant and use all the benefits of asynchronous method you can replace the above ForEach loop with ActionBlock method in Dataflow or ParallelForEachAsync extension method from AsyncEnumerator Nuget package.

2.It's a good choice to execute independent parallel queries and then merge the results, even if its performance improvement is at most 10 percent. This gives you time to be able to find the best performance friendly query. But, never forget to include all your constraints in it, and test both ways to know which one better suites your problem.

3. I'm not sure it's a good suggestion or not, But do it and see the results. As described in MSDN:

The Table service enforces server timeouts as follows:

Query operations: During the timeout interval, a query may execute for up to a maximum of five seconds. If the query does not complete within the five-second interval, the response includes continuation tokens for retrieving remaining items on a subsequent request. See Query Timeout and Pagination for more information.

Insert, update, and delete operations: The maximum timeout interval is 30 seconds. Thirty seconds is also the default interval for all insert, update, and delete operations.

If you specify a timeout that is less than the service's default timeout, your timeout interval will be used.

So you can play with timeout and check if there is any performance improvements.

UPDATE 06-30-2021

Thanks to @WouterVanRanst for close looking into the above snippet, I decided to update it and use another overload of Parallel.ForEach method, make the loop single threaded and prevent race condition on TableContinuationToken. You can find the descriptions about partition-local variables with an example here on MSDN. Here's the new look of ExecuteQueryAsync<T> method:

public static IEnumerable<T> ExecuteQueryAsync<T>(this CloudTable table, TableQuery<T> query) where T : ITableEntity, new()
{
    TableContinuationToken token = null;
    var items = new List<T>();

    Parallel.ForEach(new InfinitePartitioner(), () =>
    {
        return null as TableQuerySegment<T>;
    }, (ignored, loopState, segment) =>
    {
        segment = table.ExecuteQuerySegmented(query, token) as TableQuerySegment<T>;
        
        token = segment.ContinuationToken;

        if (token == null)
            loopState.Stop();

        return segment;
    },
    (seg) => items.AddRange(seg)
    );

    return items;
}

NOTE: Of course you can polish the code above or find a better approach to prevent race condition, but it's a simple one in no time. I'll be glad to hear your thoughts on it.

172

answered Oct 12 '22 08:10

MasLoo

So the secret is not only in the code but also in setting up your Azure storage tables.

a) One of the prominent options to optimize your queries in Azure is to introduce caching. This will drastically reduce your overall response times and thereby avoiding bottleneck during the peak hour you have mentioned.

b) Also, When querying entities out of Azure, the fastest possible way to do that is with both the PartitionKey and RowKey. These are the only indexed fields in Table Storage and any query that utilises both of these will be returned in a matter of a few milliseconds. So ensure you use both PartitionKey & RowKey.

See more details here : https://docs.microsoft.com/en-us/azure/storage/tables/table-storage-design-for-query

Hope this helps.

answered Oct 06 '22 00:10

S.S.Prabhu

  var over1000Results = table.ExecuteQueryAsync(treanslationsQuery).Result.Cast<Translation>();
        return over1000Results.Where(x => x.expireAt > DateTime.Now)
                           .Where(x => x.effectiveAt < DateTime.Now);

Here is one of the problems, you are running the query and then filtering it from memory using these "wheres". Move the filters to before the query runs which should help a lot.

Second you must provide some limit of rows to retrieve from database

answered Oct 12 '22 07:10

Miranda

Unfortunately, below query introduces a full table scan:

    TableQuery<T> treanslationsQuery = new TableQuery<T>()
     .Where(
      TableQuery.CombineFilters(
        TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, sourceDestinationPartitionKey)
       , TableOperators.Or,
        TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, anySourceDestinationPartitionKey)
      )
     );

You should split it into two Partition Key filters and query them separately, which will become two partition scans and perform more efficiently.

answered Oct 12 '22 08:10

Zhaoxing Lu

Related questions
                            
                                Amazon AWSClientFactory does not exists
                            
                                Composite Key EF Core getting error when using Fluent Api
                            
                                Null propagation feature and Razor views
                            
                                WPF Runtime Locale Change, reevaluate ValueConverters UI
                            
                                C# Composition - I'm not convinced I fully understand how to implement this
                            
                                Why Conditional attribute methods aren't allowed to return other than void
                            
                                Using Web API as SignalR server and consuming it from Windows Service
                            
                                C# DotNet Core Middleware Wrap Response
                            
                                OverrideAuthorization attribute in .NETCore
                            
                                .NET Analyzer: Package Version vs Assembly Version
                            
                                Why Is a Compiled Delegate Faster Than a Declared Delegate?
                            
                                Unable to add reference to installed NuGet package?
                            
                                ASP.NET Core 2.1 Custom RoleProvider with Windows Authentication
                            
                                What does the CookieAuthenticationOptions.LogoutPath property do in ASP.NET Core 2.1?
                            
                                Is instantiating a GUID in Entity Framework Core bad practice?
                            
                                MediatR CQRS - How to deal with unexisting resources (asp.net core web api)
                            
                                Unity: What's the difference between a PlayMode UnityTest and an EditMode UnityTest?
                            
                                Mocking a CloudBlockBlob and have it return a stream
                            
                                how to use multiple folder into app.UseStaticFiles in .net core?
                            
                                Apply Entity Framework migrations when using ASP.Net Core in a Docker image

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

how to speed up a query with partitionkey in azure table storage

Tags:

c#

azure

azure-table-storage

azure-tablequery

azure-virtual-network