Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

mongodb $all and $in very slow even on indexed fields

Tags:

mongodb

I have a collection of about 80 million documents, each of them storing an array of tags in the tags field, e.g:

{text: "blah blah blah...", tags: ["car", "auto", "automobile"]}

The field tags is indexed, so naturally the queries like this are almost instant:

 db.documents.find({tags:"car"})

However the following queries are all very slow, taking several minutes to complete:

 db.documents.find({tags:{$all:["car","phone"]}})
 db.documents.find({tags:{$in:["car","auto"]}})

The problem persists even if the array only has a single item:

 db.documents.find({tags:{$all:["car"]}})  //very slow too

I thought $all and $in should be able to work very fast because tags is indexed but apparently it is not the case. Why?

like image 437
ramirami Avatar asked Oct 06 '12 14:10

ramirami


1 Answers

It turns out this is a known bug in MongoDB which hasn't yet been fixed as of 2.2

MongoDB does not perform index intersection when searching for multiple entries using $all. Only the first item in the array is looked up using indexes, and a scan of all matched documents is performed to filter the results.

For example, in the query db.documents.find({tags:{$all:["car","phone"]}}) all documents containing the tag "car" need to be retrieved and scanned. Since the collection in question contains over a hundred thousand documents tagged with "car", the slowdown is not surprising.

Worse, MongoDB doesn't even perform the simple optimization of selecting the least represented item in the $all array for the index lookup. If there are 100000 documents tagged "car" and 10 documents tagged "phone", MongoDB will still need to scan 100000 documents to return results for {$all:["car", "phone"]}

See also: https://jira.mongodb.org/browse/SERVER-1000

like image 67
ramirami Avatar answered Oct 17 '22 11:10

ramirami