Our cosmos db aggregate query seems slow and costs a lot of RUs. Here are the details (plus see screenshot below): 2.4s and 3222RUs to count a result set of 414k records. Also this for just one count. Normally we would want to do a sum on many fields at once (possible only within a single partition), but performance for that is much worse.
There are 2 million records in this collection. We are using Cosmos DB w/SQL API. This particular collection is partitioned by country_code and there are 414,732 records in France ("FR") and the remainder in US. Document size is averages 917 bytes and maybe min is 800 bytes, max 1300 bytes.
Note that we have also tried a much sparser partitioning key like device_id (of which there are 2 million, 1 doc per device here) which has worse results for this query. The c.calcuated.flag1 field just represents a "state" that we want to keep a count of (we actually have 8 states that I'd like to summarize on).
The indexing on this collection is the default, which uses "consistent" index mode, and indexes all fields (and includes range indexes for Number and String). RU setting is at 20,000, and there is no other activity on the DB.
So let me know your thoughts on this. Can Cosmos DB be used reasonably to get a few sums or counts on fields without ramping up our RU charges and taking a long time? While 2.4s is not awful, we really need sub-second queries for this kind of thing. Our application (IoT based), often needs individual documents, but also sometimes needs these kinds of counts across all documents in a country.
Is there a way to improve performance?
The Cosmos DB team has now made some significant changes to aggregation performance and how indexes are used. This is their indexing "v2" strategy and was only recently rolled out (it may not be available to all accounts yet, contact MSFT if you have an older db that needs upgrading).
You can compare the new results to the picture I originally posted.
You'll note now that Document load time shows as 0ms and the retrieved document size is 0 bytes. The load time I can confirm is really quite fast now so it is possible it is under 1ms when measured from the server side. And document size of 0 makes more sense since no documents need to be retrieved for this (only count based on the index).
Finally you can see that the RUs dropped from 3222 to 7.4 !!!! A pretty drastic difference.
Summing on multiple columns at once within a single partition is also quite performant now and we can do about 8 sums at once across 2 million documents with ~50 RUs and it takes about 20-70ms when measured from a function API endpoint (so includes network time).
More work still needs to be done by Cosmos DB team to allow for cross partition multi-column aggregations, but the improvements we have now are quite promising.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With