Mongodb query by id using $in operator for large sets of Ids is too slow, alternatives?

Tags:

I'm working on a project that uses Solr for full text search and Mongodb as persistent storage. Basically searches in Solr return Mongo ids that we then use to fetch the documents.

The problem is that some Solr searches return results in the order of the thousands of Ids. These results are actually what we expect, so no issue with Solr here. The problem comes when we want to fetch say 10k Ids from mongodb. The query is using $in but takes way too long; after checking the mongodb profiler, it seems that mongo spends a lot of time waiting to acquire read locks.

Any alternative approaches?, maybe still using $in but spliting the Ids set into smaller chunks?.

As a side note, we're using Java 8, with Spring 4.0 and Spring-Data-Mongo 1.6

Also, as additional information, the collection has 1.3 million documents, with each document averaging 11Kb in size.

Here an example of the query:

  {"_id" : {
        "$in" : [
            ObjectId("5441614a5d28a9872823694c"),
            ObjectId("544155eb5d28a987281aa112"),
            ObjectId("5441500e5d28a9872815b917"),
            ObjectId("544153285d28a987281877b9"),
            ObjectId("544159095d28a987281c1f5c"),
            ObjectId("54415b105d28a987281d3ad7"),
            ObjectId("54415a995d28a987281cf0e6"),
            ObjectId("544160215d28a9872822383b"),
            ObjectId("544160e85d28a98728230342"),
            ObjectId("544157ba5d28a987281b7dea"),
            ObjectId("54415e375d28a9872820508b"),
            ObjectId("544150f75d28a98728169563"),
            ObjectId("54415c6b5d28a987281e8bcb"),
            ObjectId("54415a6d5d28a987281cd704").............]}}

And this is the result of explain for a small set:

{
"cursor" : "BtreeCursor _id_ multi",
"isMultiKey" : false,
"n" : 14,
"nscannedObjects" : 14,
"nscanned" : 27,
"nscannedObjectsAllPlans" : 14,
"nscannedAllPlans" : 27,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
    "_id" : [
        [
            ObjectId("5441500e5d28a9872815b917"),
            ObjectId("5441500e5d28a9872815b917")
        ],
        [
            ObjectId("544150f75d28a98728169563"),
            ObjectId("544150f75d28a98728169563")
        ],
        [
            ObjectId("544153285d28a987281877b9"),
            ObjectId("544153285d28a987281877b9")
        ],
        [
            ObjectId("544155eb5d28a987281aa112"),
            ObjectId("544155eb5d28a987281aa112")
        ],
        [
            ObjectId("544157ba5d28a987281b7dea"),
            ObjectId("544157ba5d28a987281b7dea")
        ],
        [
            ObjectId("544159095d28a987281c1f5c"),
            ObjectId("544159095d28a987281c1f5c")
        ],
        [
            ObjectId("54415a6d5d28a987281cd704"),
            ObjectId("54415a6d5d28a987281cd704")
        ],
        [
            ObjectId("54415a995d28a987281cf0e6"),
            ObjectId("54415a995d28a987281cf0e6")
        ],
        [
            ObjectId("54415b105d28a987281d3ad7"),
            ObjectId("54415b105d28a987281d3ad7")
        ],
        [
            ObjectId("54415c6b5d28a987281e8bcb"),
            ObjectId("54415c6b5d28a987281e8bcb")
        ],
        [
            ObjectId("54415e375d28a9872820508b"),
            ObjectId("54415e375d28a9872820508b")
        ],
        [
            ObjectId("544160215d28a9872822383b"),
            ObjectId("544160215d28a9872822383b")
        ],
        [
            ObjectId("544160e85d28a98728230342"),
            ObjectId("544160e85d28a98728230342")
        ],
        [
            ObjectId("5441614a5d28a9872823694c"),
            ObjectId("5441614a5d28a9872823694c")
        ]
    ]
},
"server" : "0001a22df018:27017"

}

776

asked Oct 30 '14 17:10

xburgos

1 Answers

Perhaps this information can give a help, just for reference.

The size of the collection is larger than 1.3M x 11K = 14.6GB (not a small one)
The rate of documents you want to query is 10K / 1.3M = 0.75%

The documents are indexed and finding any one should be very fast. But the collection is large. As you didn't provide information about ids then I just suppose the distribution of documents for these ids are almost arbitrary.
Firstly, MongoDB may try to find all document from memory. When can not find any more, it will load new data from disk into memory according the rest ids, and repeat searching again until finish the work at last. The loading times from disk may be a main factor to determine query performance. And the loading times is according to the distribution of your ids. If they are very dense in distribution, the query should be very fast, else it may slow. So, the speed is according to the distribution of documents you are searching.

Use sharded collection (more shard instance) may give some help.

answered Oct 20 '22 20:10

Wizard

Related questions
                            
                                mongorestore file X does not have .bson extension
                            
                                Issue persisting nested nested embedded documents
                            
                                Mapping Mongodb ObjectId to and from string automatically
                            
                                Mongoose.js: How to Implement Tree Structure via Population
                            
                                I'm trying to serialize embedded mongodb documents with JMSSerizial Bundle
                            
                                Understanding Mongo Timeouts on Save through Lock Percentage
                            
                                Mongo error: DBClientBase::findN: transport error()
                            
                                why a collection exists but can not find in show collections?
                            
                                mongodb node.js client, connect hangs
                            
                                NodeJS, Express, & Mongoose: Calls are intermittantly slow
                            
                                MongoDB Query Performance: Return all vs select fields
                            
                                What's best practice "joining" a bunch of values in mongoose/mongodb without populate
                            
                                Square Root in MongoDB Aggregate Pipeline

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Mongodb query by id using $in operator for large sets of Ids is too slow, alternatives?

Tags:

mongodb

mongodb-query

spring-data-mongodb

xburgos

People also ask

1 Answers

Wizard

Recent Activity

Donate For Us