Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MongoDB: what is faster: single find() query or many find_one()?

I have the following problem connected to the MongoDB database design. Here is my situation:

  • I have a collection with about 50k documents (15kB each),
  • every document have a dictionary storing data samples,
  • my query always gets all the data from the document,
  • every query uses an index,
  • the collection have only one index (based on a single datetime field),
  • in most cases, I need to get data from many documents (typically 25 < N < 100),
  • it is easier for me to perform many SELECT queries over a single one,
  • I have a lot of updates in my databases, much less than SELECT ones,
  • I use the WiredTiger engine (the newest version of MongoDB),
  • server instance and web application are on the same machine.

I have two possibilities for making a SELECT query:

  • perform a single query retrieving all documents I am interested in,
  • perform N queries, everyone gets a single document, where typically 25 < N < 100 (what about a different scenario when 100 < N < 1k or 1k < N < 10k?)

So the question is if there is any additional overhead when I perform many small queries over a single one? In relational databases making many queries is a very bad practice - but in NoSQL? I am asking about a general practice - should I avoid that much queries?

In the documentation, I read that the number of queries is not important but the number of searches over documents - is that true?

Thanks for help ;)

like image 945
Marcin Ziąbek Avatar asked Sep 05 '25 09:09

Marcin Ziąbek


2 Answers

There is a similar question like the one you asked : Is it ok to query mongodb multiple times

IMO, for your use-case i.e. 25<N<100, one should definitely go with batching.

In case of Single queries :

  • Looping in a single thread will not suffice, you'll have to make parallel requests which would create additional overhead
  • creates tcp/ip overhead for every request
  • there is a certain amount of setup and teardown for each query creating and exhausting cursors which would create unnecessary overhead.

As explained in the answer above, there appears be a sweet-spot for how many values to batch up vs. the number of round trips and that depends on your document type as well.

In broader terms, anything 10<N<1000 should go with batching and the remaining records should form part of other batches but querying single document at a time would definitely create unnecessary overhead.

like image 126
Rahul Avatar answered Sep 08 '25 00:09

Rahul


The problem when you perform small queries over one query is network overhead that is the network latency roundtrip.

For a single request in a batch processing it may be not much, but if you make multiple requests like these or use this technique on frontend it will decrease performance.

Also you may need to preprocess the data like sorting aggregating it manually.

like image 34
Dan Ionescu Avatar answered Sep 08 '25 00:09

Dan Ionescu