Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Significant performance degration on ExecuteQuerySegmentedAsync between Microsoft.Azure.Cosmos.Table and Microsoft.WindowsAzure.Storage

I've been researching moving from Storage Account table storage to CosmosDB table storage. Currently I am using the WindowsAzure.Storage (9.3.3) library to query data in a .net core 3.1 application. As part of this migration I have switched to the Microsoft.Azure.Cosmos.Table 1.0.7 library. I wrote the LinqPad benchmark below to compare the performance of both when doing a full table scan.

async Task Main()
{
    var timer = Stopwatch.StartNew();
    await QueryCosmosDb().ConfigureAwait(false);
    timer.Stop();
    var cosmosExecutionTime = timer.Elapsed;

    timer = Stopwatch.StartNew();
    await QueryTableStorage().ConfigureAwait(false);
    timer.Stop();
    var tableExecutionTime = timer.Elapsed;
    
    cosmosExecutionTime.Dump();
    tableExecutionTime.Dump();
}

public async Task QueryCosmosDb()
{
    var cosmosTableEndpoint = new Uri($"https://***.table.cosmos.azure.com:443/");
    var storageAccount = new Microsoft.Azure.Cosmos.Table.CloudStorageAccount(new Microsoft.Azure.Cosmos.Table.StorageCredentials("***", "****"), cosmosTableEndpoint);
    var client = storageAccount.CreateCloudTableClient();
    var table = client.GetTableReference("tablename");
    var query = new Microsoft.Azure.Cosmos.Table.TableQuery();
    Microsoft.Azure.Cosmos.Table.TableContinuationToken token = null;
    do
    {
        var segment = await table.ExecuteQuerySegmentedAsync(query, token).ConfigureAwait(false);
        token = segment.ContinuationToken.Dump();
    }
    while (token != null);
}

public async Task QueryTableStorage()
{
    var storageAccount = new Microsoft.WindowsAzure.Storage.CloudStorageAccount(new Microsoft.WindowsAzure.Storage.Auth.StorageCredentials("***", "****"), true);
    var client = storageAccount.CreateCloudTableClient();
    var table = client.GetTableReference("tablename");
    var query = new Microsoft.WindowsAzure.Storage.Table.TableQuery();
    Microsoft.WindowsAzure.Storage.Table.TableContinuationToken token = null;
    do
    {
        var segment = await table.ExecuteQuerySegmentedAsync(query, token).ConfigureAwait(false);
        token = segment.ContinuationToken;
    }
    while (token != null);
}

The Storage Account table and CosmosDb table have an identical datasets of roughly 200k entities.

The Cosmos Table Account has a shared provision throughput of 2200 RUs.

When using the Cosmos Executor with the Microsoft.Azure.Cosmos.Table library I am getting an execution time of ~3 hours. The Storage Account table with the Microsoft.WindowsAzure.Storage library takes ~2 minutes. If I switch the Microsoft.Azure.Cosmos.Table library to use the rest executor in the Cloud Table Client I get an execution time of ~3 minutes.

Has anyone encountered similar behavior or aware of issues around empty table queries?

Also added ticket to Github Issues in azure-cosmos-table-dotnet

like image 753
Blaine Avatar asked Jul 01 '20 15:07

Blaine


People also ask

What is a benefit of the azure cosmos DB table API as compared to Azure table storage?

What is a benefit of the Azure Cosmos DB Table API as compared to Azure Table storage? Multi-master support for Azure Cosmos DB is now available in all public regions. Azure CosmosDB table API is a key-value storage hosted in the cloud. It's a part of Azure Cosmos DB, that is Microsoft's multi-model database.

What is a benefit of the azure cosmos DB table?

Azure Cosmos DB offers 99.99% guarantees for availability, throughput, low latency, and consistency on all single-region accounts and all multi-region accounts with relaxed consistency, and 99.999% read availability on all multi-region database accounts.

Is Microsoft Azure Cosmos Table deprecated?

Azure. Cosmos. Table is deprecated in favor of Azure. Data.

Is Azure table storage Cosmos DB?

Cosmos DB is a superset of the Azure Table Storage functionality. You will choose Cosmos DB when you need multiple region redundancy, the highest throughput, minimal latency, or control of failover scenarios.


1 Answers

Its the internal implementation for the ExecuteQuery method which is causing the time difference, so there's no chance we could fix the issue unless Microsoft notices and fixes the issue in the upcoming release, anyways now that they have deprecated and using common library Microsoft.Azure.Cosmos it must've solved the issue, hope this helps

like image 191
ChinnarajS Avatar answered Oct 07 '22 19:10

ChinnarajS