I have a collection of about 80 million documents, each of them storing an array of tags in the tags
field, e.g:
{text: "blah blah blah...", tags: ["car", "auto", "automobile"]}
The field tags
is indexed, so naturally the queries like this are almost instant:
db.documents.find({tags:"car"})
However the following queries are all very slow, taking several minutes to complete:
db.documents.find({tags:{$all:["car","phone"]}})
db.documents.find({tags:{$in:["car","auto"]}})
The problem persists even if the array only has a single item:
db.documents.find({tags:{$all:["car"]}}) //very slow too
I thought $all and $in should be able to work very fast because tags
is indexed but apparently it is not the case. Why?
It turns out this is a known bug in MongoDB which hasn't yet been fixed as of 2.2
MongoDB does not perform index intersection when searching for multiple entries using $all
. Only the first item in the array is looked up using indexes, and a scan of all matched documents is performed to filter the results.
For example, in the query db.documents.find({tags:{$all:["car","phone"]}})
all documents containing the tag "car" need to be retrieved and scanned. Since the collection in question contains over a hundred thousand documents tagged with "car", the slowdown is not surprising.
Worse, MongoDB doesn't even perform the simple optimization of selecting the least represented item in the $all array for the index lookup. If there are 100000 documents tagged "car" and 10 documents tagged "phone", MongoDB will still need to scan 100000 documents to return results for {$all:["car", "phone"]}
See also: https://jira.mongodb.org/browse/SERVER-1000
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With