Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

most efficient way to bulk fetch by known IDS

What is the 'best' way to fetch a bunch of documents assuming that I have a list of IDs

I know I could try various things but at small scale probably all options have similar performance. So far I have tried nothing - just read docs

Maybe there is no 'best' way but what would be the trade-offs between various methods (speed, cost, overall throughput,...)

Sigh - I knew this would get down voted - along the lines of 'what have you tried', 'we wont write your code for you' etc. I cannot do meaningful perf analysis till I have thousands of parallel requests coming simultaneously on terabytes of data. I swear I am not lazy or unwilling to put work in , just dont want to get into production and find I have perf issues and then be told 'why on earth did you do it that way?'

like image 667
pm100 Avatar asked Oct 30 '22 13:10

pm100


1 Answers

Some general tips on the best way to perform reads with DocumentDB.

  • If you have a small number of documents, then using ReadDocumentAsync will be the best way to do this, across multiple threads each fetching a document using partition key and id. Each read is 1 RU per 1KB document, and under 10 ms at p99.
  • If you have a large batch of documents, then using a query like SELECT * FROM c WHERE c.partitionKey = 'pk' AND c.id IN ('1','2',..., 'N') will be more efficient, i.e. fewer connections from the client, and also fewer RUs on the server side (typically < 1 RU per document returned).
  • If you need to fetch data across multiple partition keys, then it's harder to tell if individual reads or a single query would perform better. This requires a more detailed testing and depends on the number of documents read and number of distinct partition keys.
like image 118
Aravind Krishna R. Avatar answered Jan 02 '23 20:01

Aravind Krishna R.