I have a scenario where I need to retrieve a single document based on its id property from azure cosmos db. The only problem is I don't know the partition key and thus cannot use the document URI to access it.
From my understanding writing a simple query like
SELECT * from c WHERE c.id = "id here"
should be the way to go but I'm experiencing severe performance issues with this query. Most queries take 30s to 60s to complete and seem to consume insane amounts of RU/s. When executing 10 concurrent queries the max RU/s per partition went as high as 30.000. (10.00 per partition was provisioned) Resulting in throttling and even slower responses.
The collection comprises 10 partitions with around 3 Mb per partition, so 30 Mb in total and around 1,00,000 documents. My indexing policy looks like this:
{
"indexingMode": "lazy",
"automatic": true,
"includedPaths": [
{
"path": "/*",
"indexes": [
{
"kind": "Range",
"dataType": "Number",
"precision": -1
},
{
"kind": "Hash",
"dataType": "String",
"precision": 3
}
]
}
],
"excludedPaths": []
}
And the consistency is set to EVENTUAL
since I don't really care about read/write order. The collection is under some write pressure with about 30 writes per minute and there's a TTL of 1 year for each document, yet this doesn't seem to produce a measurable impact on the RU/s. I experience this sort of problem only when querying documents.
Has anyone had similar problems and can offer a fix/mitigation? Am I doing something wrong with my query or indexing policy? I don't know why my query is consuming that much resources.
I got similar problem even. My database is 16 GB with 2 partitions and has 10,000 RU per partition.
By gathering query metrics, I found that query by id
could be doing table scan and not looking up from index.
Here is the metrics of query by id:
SELECT * FROM c where c.id = 'id-here'
--Read 1 record in 1497.00 ms, 339173.109 RU
--QueryPreparationTime(ms): CompileTime = 2, LogicalBuildTime = 0,
PhysicalPlanBuildTime = 0, OptimizationTime = 0
--QueryEngineTime(ms): DocumentLoadTime = 1126, IndexLookupTime = 0,
RuntimeExecutionTimes = 356, WriteOutputTime = 0
Notice the time spent mostly in DocumentLoadTime
and IndexLookupTime = 0
.
While query by indexed field is pretty fast.
SELECT * FROM c WHERE c.indexedField = 'value'
--Read 4 records in 2.00 ms, 7.56 RU
--QueryPreparationTime(ms): CompileTime = 0, LogicalBuildTime = 0,
PhysicalPlanBuildTime = 0, OptimizationTime = 0
--QueryEngineTime(ms): DocumentLoadTime = 0, IndexLookupTime = 1,
RuntimeExecutionTimes = 0, WriteOutputTime = 0
Contrast to the query by id, this doesn't consumed DocumentLoadTime
as the index was used, IndexLookupTime
is 1 ms.
The problem is id
should be the primary key and should be indexed by default but it looks like it doesn't. You couldn't even add custom indexing policy for it.
I'm currently logged a ticket to Microsoft support and waiting for clarifications.
Update:
Microsoft support responded and they've resolved the issue. They've added IndexVersion 2
for the collection. Unfortunately, it is not yet available from the portal and newly created accounts/collection are still not using the new version. You'll have to contact Microsoft Support to made changes to your accounts.
Here are the new results from a collection with index version 2 and there's a massive improvement.
SELECT * FROM c where c.id = 'uniqueValue'
-- Index Version 1: Request Charge: 344,940.79 RUs
-- Index Version 2: Request Charge: 3.31 RUs
SELECT * FROM c WHERE c.indexedField = 'value' AND c.id = 'uniqueValue'
-- Index Version 1: Request Charge: 150,666.22 RUs
-- Index Version 2: Request Charge: 5.65 RUs
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With