Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cosmos DB paging performance with OFFSET and LIMIT

Tags:

I'm creating an API based on Cosmos DB and ASP.NET Core 3.0. Using the Cosmos DB 4.0 preview 1 .NET Core SDK. I implemented paging using the OFFSET and LIMIT clause. I'm seeing the RU charge increase significantly the higher in the page count you go. Example for a page size of 100 items:

Page 1: 9.78 RU
Page 10: 37.28 RU
Page 100: 312.22 RU
Page 500: 358.68 RU

The queries are simply:

SELECT * from c OFFSET [page*size] LIMIT [size]

Am I doing something wrong, or is this expected? Does OFFSET require scanning the entire logical partition? I'm querying against a single partition key with about 10000 items in the partition. It seems like the more items in the partition, the worse the performance gets. (See also comment by "Russ" in the uservoice for this feature).

Is there a better way to implement efficient paging through the entire partition?

Edit 1: Also, I notice doing queries in the Cosmos Emulator also slow waaayyy down when doing OFFSET/LIMIT in a partition with 10,000 items.

Edit 2: Here is my repository code for the query. Essentially, it is wrapping the Container.GetItemQueryStreamIterator() method and pulling out the RU while processing IAsyncEnumerable. The query itself is the SQL string above, no LINQ or other mystery there.

public async Task<RepositoryPageResult<T>> GetPageAsync(int? page, int? pageSize, EntityFilters filters){

// Enforce default page and size if null
int validatedPage = GetValidatedPageNumber(page);
int validatedPageSize = GetValidatedPageSize(pageSize);

IAsyncEnumerable<Response> responseSet = cosmosService.Container.GetItemQueryStreamIterator(
    BuildQuery(validatedPage, validatedPageSize, filters),
    requestOptions: new QueryRequestOptions()
    {
        PartitionKey = new PartitionKey(ResolvePartitionKey())
    });

var pageResult = new RepositoryPageResult<T>(validatedPage, validatedPageSize);
await foreach (Response response in responseSet)
{
    LogResponse(response, COSMOS_REQUEST_TYPE_QUERY_ITEMS); // Read RU charge
    if (response.Status == STATUS_OK && response.ContentStream != null)
    {
        CosmosItemStreamQueryResultSet<T> responseContent = await response.ContentStream.FromJsonStreamAsync<CosmosItemStreamQueryResultSet<T>>();
        pageResult.Entities.AddRange(responseContent.Documents);
        foreach (var item in responseContent.Documents)
        {
            cache.Set(item.Id, item); // Add each item to cache
        }
    }
    else
    {
        // Unexpected status. Abort processing.
        return new RepositoryPageResult<T>(false, response.Status, message: "Unexpected response received while processing query response.");
    }
}

pageResult.Succeeded = true;
pageResult.StatusCode = STATUS_OK;
return pageResult;

}

Edit 3:

Running the same raw SQL from cosmos.azure.com, I noticed in query stats:

OFFSET 0 LIMIT 100: Output document count = 100, Output document size = 44 KB
OFFSET 9900 LIMIT 100: Output document count = 10000, Output document size = 4.4 MB

And indeed, inspecting the network tab in browser reveals 100 separate HTTP queries, each retrieving 100 documents! So OFFSET appears to be currently not at the database, but at the client, which retrieves EVERYTHING before throwing away the first 99 queries worth of data. This can't be the intended design? Isn't the query supposed to tell the database to return only 100 items total, in 1 response, not all 10000 so the client can throw away 9900?

like image 314
Noah Stahl Avatar asked Nov 08 '19 18:11

Noah Stahl


2 Answers

Based on the code it would mean that the client is skipping the documents and thus the increase of RUs.

I tested the same scenario on the browser (cosmos.azure.com, uses the JS SDK) and the behavior is the same, as offset moves, the RU increases.

Paging in cosmos.azure.com

Paging in cosmos.azure.com increasing OFFSET

like image 59
Matias Quaranta Avatar answered Oct 12 '22 22:10

Matias Quaranta


It is documented here in the official documentation, under remarks https://docs.microsoft.com/en-us/azure/cosmos-db/sql-query-offset-limit

The RU charge of a query with OFFSET LIMIT will increase as the number of terms being offset increases. For queries that have multiple pages of results, we typically recommend using continuation tokens. Continuation tokens are a "bookmark" for the place where the query can later resume. If you use OFFSET LIMIT, there is no "bookmark". If you wanted to return the query's next page, you would have to start from the beginning.

like image 32
mikelowrey23 Avatar answered Oct 12 '22 23:10

mikelowrey23