Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Azure DocumentDB Query by Id is very slow

I got a 16GB collection with 2 partitions. When I query a document by it's Id, it is very slow. But querying by an indexed field is fast. Both are cross-partition query and if I pass partition key with the query it is fast but partition key is not always available for my query. Got similar results using .NET SDK and Document Explorer Query in Azure Portal.

The collection has custom indexing policy but as far as I know you don't need to index Id or it may not be even possible.

Here are my queries and their corresponding request charges.

SELECT * FROM c where c.id = 'unique-id-123'
-- Request Charge: 344940.79 RUs, Document Count: 1

SELECT * FROM c WHERE c.otherId = 'NOT-so-uniqueId-123'
-- Request Charge: 5.08 RUs, Document Count: 3

As you know, Id is unique so the query returns 1 document while the second query is filtered by otherId which is not so unique and returns 3 documents. Also notice the insanely high RUs consumption with the first query.

So why the 2nd query is faster than by Id?


Update:
Here are the gathered metrics for the above queries.

Query By Id:

Read 1 records in 1497 ms, 339173.109 RU, Size: 6873022 KB
QueryPreparationTime(ms): CompileTime = 2, LogicalBuildTime = 0,
     PhysicalPlanBuildTime = 0, OptimizationTime = 0
QueryEngineTime(ms): DocumentLoadTime = 1126, IndexLookupTime = 0,
     RuntimeExecutionTimes = 356, WriteOutputTime = 0

Query by indexed field:

Read 4 records in 2 ms, 7.56 RU, Size: 9 KB
QueryPreparationTime(ms): CompileTime = 0, LogicalBuildTime = 0, 
     PhysicalPlanBuildTime = 0, OptimizationTime = 0
QueryEngineTime(ms): DocumentLoadTime = 0, IndexLookupTime = 1, 
     RuntimeExecutionTimes = 0, WriteOutputTime = 0

These proves that query by Id is doing table scan as most of the time spent was from DocumentLoadTime and no value for IndexLookupTime.
But I thought Id should be the primary key and indexed by default as per this answer by @andrew-liu.

like image 581
Mark Trinidad Avatar asked Apr 06 '18 03:04

Mark Trinidad


1 Answers

Microsoft support responded and they've resolved the issue. They've added IndexVersion 2 for the collection. Unfortunately, it is not yet available from the portal and newly created accounts/collection are still not using the new version. You'll have to contact Microsoft Support to made changes to your accounts.

Here are the new results from a collection with index version 2 and there's a massive improvement.

SELECT * FROM c where c.id = 'uniqueValue'
-- Index Version 1: Request Charge: 344,940.79 RUs
-- Index Version 2: Request Charge: 3.31 RUs

SELECT * FROM c WHERE c.indexedField = 'value' AND c.id = 'uniqueValue'
-- Index Version 1: Request Charge: 150,666.22 RUs 
-- Index Version 2: Request Charge: 5.65 RUs
like image 69
Mark Trinidad Avatar answered Nov 16 '22 00:11

Mark Trinidad