retrieving a document by id is slow across partitions in cosmos db

Question

I have a scenario where I need to retrieve a single document based on its id property from azure cosmos db. The only problem is I don't know the partition key and thus cannot use the document URI to access it.

From my understanding writing a simple query like

SELECT * from c WHERE c.id = "id here"

should be the way to go but I'm experiencing severe performance issues with this query. Most queries take 30s to 60s to complete and seem to consume insane amounts of RU/s. When executing 10 concurrent queries the max RU/s per partition went as high as 30.000. (10.00 per partition was provisioned) Resulting in throttling and even slower responses.

The collection comprises 10 partitions with around 3 Mb per partition, so 30 Mb in total and around 1,00,000 documents. My indexing policy looks like this:

{
    "indexingMode": "lazy",
    "automatic": true,
    "includedPaths": [
        {
            "path": "/*",
            "indexes": [
                {
                    "kind": "Range",
                    "dataType": "Number",
                    "precision": -1
                },
                {
                    "kind": "Hash",
                    "dataType": "String",
                    "precision": 3
                }
            ]
        }
    ],
    "excludedPaths": []
}

And the consistency is set to EVENTUAL since I don't really care about read/write order. The collection is under some write pressure with about 30 writes per minute and there's a TTL of 1 year for each document, yet this doesn't seem to produce a measurable impact on the RU/s. I experience this sort of problem only when querying documents.

Has anyone had similar problems and can offer a fix/mitigation? Am I doing something wrong with my query or indexing policy? I don't know why my query is consuming that much resources.

Mark Trinidad · Accepted Answer

I got similar problem even. My database is 16 GB with 2 partitions and has 10,000 RU per partition.

By gathering query metrics, I found that query by id could be doing table scan and not looking up from index.

Here is the metrics of query by id:

SELECT * FROM c where c.id = 'id-here'
--Read 1 record in 1497.00 ms, 339173.109 RU
--QueryPreparationTime(ms): CompileTime = 2, LogicalBuildTime = 0, 
     PhysicalPlanBuildTime = 0, OptimizationTime = 0
--QueryEngineTime(ms): DocumentLoadTime = 1126, IndexLookupTime = 0, 
     RuntimeExecutionTimes = 356, WriteOutputTime = 0

Notice the time spent mostly in DocumentLoadTime and IndexLookupTime = 0.

While query by indexed field is pretty fast.

SELECT * FROM c WHERE c.indexedField = 'value'
--Read 4 records in 2.00 ms, 7.56 RU
--QueryPreparationTime(ms): CompileTime = 0, LogicalBuildTime = 0, 
       PhysicalPlanBuildTime = 0, OptimizationTime = 0
--QueryEngineTime(ms): DocumentLoadTime = 0, IndexLookupTime = 1, 
       RuntimeExecutionTimes = 0, WriteOutputTime = 0

Contrast to the query by id, this doesn't consumed DocumentLoadTime as the index was used, IndexLookupTime is 1 ms.

The problem is id should be the primary key and should be indexed by default but it looks like it doesn't. You couldn't even add custom indexing policy for it.

I'm currently logged a ticket to Microsoft support and waiting for clarifications.

Update:

Microsoft support responded and they've resolved the issue. They've added IndexVersion 2 for the collection. Unfortunately, it is not yet available from the portal and newly created accounts/collection are still not using the new version. You'll have to contact Microsoft Support to made changes to your accounts.

Here are the new results from a collection with index version 2 and there's a massive improvement.

SELECT * FROM c where c.id = 'uniqueValue'
-- Index Version 1: Request Charge: 344,940.79 RUs
-- Index Version 2: Request Charge: 3.31 RUs

SELECT * FROM c WHERE c.indexedField = 'value' AND c.id = 'uniqueValue'
-- Index Version 1: Request Charge: 150,666.22 RUs 
-- Index Version 2: Request Charge: 5.65 RUs

retrieving a document by id is slow across partitions in cosmos db

Tags:

azure

azure-cosmosdb

Peter

1 Answers

Mark Trinidad

Recent Activity

Donate For Us

retrieving a document by id is slow across partitions in cosmos db

Tags:

azure

azure-cosmosdb

Peter

1 Answers

Mark Trinidad

Related questions

Recent Activity

Donate For Us