Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cosmos DB continuation token size influences whether query returns new documents

I was messing around with the Azure Cosmos DB (via .NET SDK) and noticed something odd.

Normally when I request a query page by page using continuation tokens, I never get documents that were created after the first continuation token had been created. I can observe changed documents, lack of removed (or rather newly filtered out) documents, but not the new ones. However, if I only allow 1kB continuation tokens (the smallest I can set), I get the new documents as well. As long as they end up sorted to the remaining pages, obviously.

This kind of makes sense, since with the size limit, I prevent the Cosmos DB from including the serialized index lookup and whatnot in the continuation token. As a downside, the Cosmos DB has to recreate the resume state for every page I request, what will cost some extra RUs. At least according to this discussion. As a side-effect, new documents end up in the result.

Now, I actually have a couple of questions in regards to this.

  1. Is this behavior reliable? I'd love to see some documentation on this.
  2. Is the amount of RUs saved by a larger continuation token significant?
  3. Is there another way to get new documents included in the result?
  4. Are my assumptions completely wrong?
like image 713
vit Avatar asked Jan 16 '19 11:01

vit


People also ask

How do continuation tokens work?

The continuation token is used to recreate the state of the index and track progress of the execution. "Next set of results" means, the query is executed again on from a "bookmark" from the previous execution. This bookmark is provided by the continuation token.

How do you use continuation token Cosmos DB?

Continuation tokens In the . NET SDK and Java SDK you can optionally use continuation tokens as a bookmark for your query's progress. Azure Cosmos DB query executions are stateless at the server side and can be resumed at any time using the continuation token.

Why is Cosmos DB so slow?

Request throttling is the most common reason for slow requests. Azure Cosmos DB throttles requests if they exceed the allocated request units for the database or container. The SDK has built-in logic to retry these requests.

How does CosmosDB store data?

Cosmos DB automatically indexes all the data without requiring schema and index management. Cosmos DB is a multi-model database, i.e., it can be used for storing data in Key-value Pair, Document-based, Graph-based, Column Family-based databases.


1 Answers

I am from the CosmosDB Engineering Team.

  1. Is this behavior reliable? I'd love to see some documentation on this.

We brought in this feature (limiting continuation token size) due to an ask from customers to help in reducing the response continuation size. We are of the opinion that it's too much detail to expose the effects of pruning the continuation, since for most customers the subtle behavior change shouldn't matter.

  1. Is the amount of RUs saved by a larger continuation token significant?

This depends on the amount of work done in producing the state from the index. For example, if we had to evaluate a range predicate (e.g. _ts > some discrete second), then the RU saved could be significant, since we potentially avoid scanning a whole bunch of index keys corresponding to _ts (this could be O(number of documents), assuming the worst case of having inserted at most 1 document per second). In this scenario, assuming X continuations, we save (X - 1) * O(number of documents) worth of work.

  1. Is there another way to get new documents included in the result?

No, not unless you force CosmosDB to re-evaluate the index every continuation by setting the header to 1. Typically queries are meant to be executed fairly quickly over continuations, so the chance of users seeing new documents should be fairly small. Ideally we should implement snapshot isolation to retrieve results with the session token from the first continuation, but we haven't done this yet.

  1. Are my assumptions completely wrong?

Your assumptions are spot on :)

like image 167
Krishnan Sundaram Avatar answered Oct 06 '22 07:10

Krishnan Sundaram