I'm working on a project that uses Solr for full text search and Mongodb as persistent storage. Basically searches in Solr return Mongo ids that we then use to fetch the documents.
The problem is that some Solr searches return results in the order of the thousands of Ids. These results are actually what we expect, so no issue with Solr here. The problem comes when we want to fetch say 10k Ids from mongodb. The query is using $in but takes way too long; after checking the mongodb profiler, it seems that mongo spends a lot of time waiting to acquire read locks.
Any alternative approaches?, maybe still using $in but spliting the Ids set into smaller chunks?.
As a side note, we're using Java 8, with Spring 4.0 and Spring-Data-Mongo 1.6
Also, as additional information, the collection has 1.3 million documents, with each document averaging 11Kb in size.
Here an example of the query:
{"_id" : {
"$in" : [
ObjectId("5441614a5d28a9872823694c"),
ObjectId("544155eb5d28a987281aa112"),
ObjectId("5441500e5d28a9872815b917"),
ObjectId("544153285d28a987281877b9"),
ObjectId("544159095d28a987281c1f5c"),
ObjectId("54415b105d28a987281d3ad7"),
ObjectId("54415a995d28a987281cf0e6"),
ObjectId("544160215d28a9872822383b"),
ObjectId("544160e85d28a98728230342"),
ObjectId("544157ba5d28a987281b7dea"),
ObjectId("54415e375d28a9872820508b"),
ObjectId("544150f75d28a98728169563"),
ObjectId("54415c6b5d28a987281e8bcb"),
ObjectId("54415a6d5d28a987281cd704").............]}}
And this is the result of explain for a small set:
{
"cursor" : "BtreeCursor _id_ multi",
"isMultiKey" : false,
"n" : 14,
"nscannedObjects" : 14,
"nscanned" : 27,
"nscannedObjectsAllPlans" : 14,
"nscannedAllPlans" : 27,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"_id" : [
[
ObjectId("5441500e5d28a9872815b917"),
ObjectId("5441500e5d28a9872815b917")
],
[
ObjectId("544150f75d28a98728169563"),
ObjectId("544150f75d28a98728169563")
],
[
ObjectId("544153285d28a987281877b9"),
ObjectId("544153285d28a987281877b9")
],
[
ObjectId("544155eb5d28a987281aa112"),
ObjectId("544155eb5d28a987281aa112")
],
[
ObjectId("544157ba5d28a987281b7dea"),
ObjectId("544157ba5d28a987281b7dea")
],
[
ObjectId("544159095d28a987281c1f5c"),
ObjectId("544159095d28a987281c1f5c")
],
[
ObjectId("54415a6d5d28a987281cd704"),
ObjectId("54415a6d5d28a987281cd704")
],
[
ObjectId("54415a995d28a987281cf0e6"),
ObjectId("54415a995d28a987281cf0e6")
],
[
ObjectId("54415b105d28a987281d3ad7"),
ObjectId("54415b105d28a987281d3ad7")
],
[
ObjectId("54415c6b5d28a987281e8bcb"),
ObjectId("54415c6b5d28a987281e8bcb")
],
[
ObjectId("54415e375d28a9872820508b"),
ObjectId("54415e375d28a9872820508b")
],
[
ObjectId("544160215d28a9872822383b"),
ObjectId("544160215d28a9872822383b")
],
[
ObjectId("544160e85d28a98728230342"),
ObjectId("544160e85d28a98728230342")
],
[
ObjectId("5441614a5d28a9872823694c"),
ObjectId("5441614a5d28a9872823694c")
]
]
},
"server" : "0001a22df018:27017"
}
For comparison of different BSON type values, see the specified BSON comparison order. If the field holds an array, then the $in operator selects the documents whose field holds an array that contains at least one element that matches a value in the specified array (for example, <value1> , <value2> , and so on).
THE ANSWER: 1,525,198 (That's 1.5 million.
The slow queries can happen when you do not have proper DB indexes. Indexes support the efficient execution of queries in MongoDB. Without indexes, MongoDB must perform a collection scan, i.e. scan every document in a collection, to select those documents that match the query statement.
You must apply limit() to the cursor before retrieving any documents from the database. Use limit() to maximize performance and prevent MongoDB from returning more results than required for processing.
Perhaps this information can give a help, just for reference.
The size of the collection is larger than 1.3M x 11K = 14.6GB (not a small one)
The rate of documents you want to query is 10K / 1.3M = 0.75%
The documents are indexed and finding any one should be very fast. But the collection is large. As you didn't provide information about ids then I just suppose the distribution of documents for these ids are almost arbitrary.
Firstly, MongoDB may try to find all document from memory. When can not find any more, it will load new data from disk into memory according the rest ids, and repeat searching again until finish the work at last. The loading times from disk may be a main factor to determine query performance. And the loading times is according to the distribution of your ids. If they are very dense in distribution, the query should be very fast, else it may slow. So, the speed is according to the distribution of documents you are searching.
Use sharded collection (more shard instance) may give some help.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With