Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cosmos DB (DocumentDB API): Efficient way to query most recent document by partition ID?

I have a Cosmos DB collection with numerous partitions based on a device ID. I frequently have use cases that require retrieving the most recent document by a specific device ID. I'm currently using the SELECT TOP 1 functionality available in the DocumentDB API as shown below to accomplish this:

SELECT TOP 1 *
FROM c
WHERE c.deviceId = 5
ORDER BY c.timeStamp DESC

This approach results in increased RU/s consumption and decreased performance as the collection and individual partitions grow in size, as one would expect . As a temporary remedy to this issue, I have added additional where clauses to limit the scope of the query by timestamp:

SELECT TOP 1 *
FROM c
WHERE c.deviceId = 5
 AND c.timeStamp >= 1506608558 --timestamps are unix/epoch based to optimize indexing
 AND c.timeStamp <= 1506694958
ORDER BY c.timeStamp DESC

I would like to know if there's a better way to select the latest document by partition id, as the addition of this where clause could result in unexpected or missing results.

like image 251
JTW Avatar asked Sep 29 '17 14:09

JTW


People also ask

How do I query a document in Cosmos DB?

In the Azure Cosmos DB blade, locate and select the Data Explorer link on the left side of the blade. In the Data Explorer section, expand the NutritionDatabase database node and then expand the FoodCollection container node. Within the FoodCollection node, select the Items link. View the items within the container.

What is partition ID in Cosmos DB?

Azure Cosmos DB uses partitioning to scale individual containers in a database to meet the performance needs of your application. In partitioning, the items in a container are divided into distinct subsets called logical partitions.

How do I find the document ID for Cosmos DB?

Note that the {databaseaccount} is the name of the Azure Cosmos DB account created under your subscription. The {db-id} value is the user generated name/ID of the database, not the system generated ID (rid). The {coll-id} value is the name of the collection. The {doc-id} value is the ID of the document to be retrieved.


1 Answers

I had a similar scenario where the Id of the asset I am tracking forms my partition key, and within that partition there are 2,880 events per day per asset, and that will continue to grow over time.

While full event history was required for other use cases, this particular use case required the latest event to be extracted. So an alternate Collection was created which uses the same partition key, but contains the CURRENT state i.e. latest event for that asset.

When an event is written to the WRITE Side, being the collection which persists all events for an asset, a trigger updates the READ side with the latest value.

While this may appear to being doubling up the effort on writes, in our use case the performance increase on the read side made up for this.

I found this MS article to be of use Working with the change feed support in Azure Cosmos DB

like image 102
Mr Slim Avatar answered Oct 18 '22 20:10

Mr Slim