I have a collection with large number of documents (32 809 900). All documents have a field called soft_deleted
. I've also created soft_deleted: 1
field. Then I've tested a few different queries related to the field. Here are my results:
Query Number of results Time in milliseconds
db.cgm_egvs
.find().count() 32809900 90
db.cgm_egvs
.find({soft_deleted: true}) 2820897 688
.count()
db.cgm_egvs
.find({soft_deleted: false}) 29989003 3983
.count()
db.cgm_egvs
.find({soft_deleted: null}) 0 42
.count()
db.cgm_egvs
.find({soft_deleted: {$ne: true}}) 29989003 82397
.count()
Why are query times so different between these queries? I'd expect finding documents where soft_deleted
is true
or false
to take the same amount of time.
More importantly, why is querying by != true
so much slower than any other query?
The soft_deleted field has very low cardinality; it has only two distinct values true and false, so you will not have much benefit having an index on this field. Normally indexes perform better on fields with high cardinality.
In case of {soft_deleted: true} query, the number of rows with soft_deleted: true is very less compared to {soft_deleted: false}, and mongodb had to scan much lower number of index entries. So the {soft_deleted: true} query took less time.
Similarly the query {soft_deleted: null} took less time as the index has only 2 distinct values, and in this case much lower scanning is required.
Your final query is using $ne operator, and $ne operator is not selective (selectivity is the ability of a query to narrow results using the index). https://docs.mongodb.com/v3.0/faq/indexes/#using-ne-and-nin-in-a-query-is-slow-why. So it took much more time to execute.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With