I have some problems with very slow distinct commands that use a query. From what I have observed the distinct command only makes use of an index if you do not specify a query:
I have created a test database on my MongoDB 3.0.10 server with 1Mio objects. Each object looks as follows:
{
"_id" : ObjectId("56e7fb5303858265f53c0ea1"),
"field1" : "field1_6",
"field2" : "field2_10",
"field3" : "field3_29",
"field4" : "field4_64"
}
The numbers at the end of the field values are random 0-99.
On the collections two simple indexes and one compound-index has been created:
{ "field1" : 1 } # simple index on "field1"
{ "field2" : 1 } # simple index on "field2"
{ # compound index on all fields
"field2" : 1,
"field1" : 1,
"field3" : 1,
"field4" : 1
}
Now I execute distinct queries on that database:
db.runCommand({ distinct: 'dbtest',key:'field1'})
The result contains 100 values, nscanned=100 and it has used index on "field1".
Now the same distinct query is limited by a query:
db.runCommand({ distinct: 'dbtest',key:'field1',query:{field2:"field2_10"}})
It contains again 100 values, however nscanned=9991 and the used index is the third one on all fields.
Now the third index that was used in the last query is dropped. Again the last query is executed:
db.runCommand({ distinct: 'dbtest',key:'field1',query:{field2:"field2_10"}})
It contains again 100 values, nscanned=9991 and the used index is the "field2" one.
Conclusion: If I execute a distinct command without query the result is taken directly from an index. However when I combine a distinct command with a query only the query uses an index, the distinct command itself does not use an index in such a case.
My problem is that I need to perform a distinct command with query on a very large database. The result set is very large but only contains ~100 distinct values. Therefore the complete distinct command takes ages (> 5 minutes) as it has to cycle through all values.
What needs to be done to perform my distinct command presented above that can be answered by the database directly from an index?
The index is automatically used for distinct queries if your Mongo database version supports it.
The possibility to use an index in a distinct query requires Mongo version 3.4 or higher - it works for both storage engines MMAPv1/WiredTiger.
See also the bug ticket https://jira.mongodb.org/browse/SERVER-19507
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With