Collection events has userId and an array of events-- each element in the array is an embedded document. Example:
{
"_id" : ObjectId("4f8f48cf5f0d23945a4068ca"),
"events" : [
{
"eventType" : "profile-updated",
"eventId" : "247266",
"eventDate" : ISODate("1938-04-27T23:05:51.451Z"),
},
{
"eventType" : "login",
"eventId" : "64531",
"eventDate" : ISODate("1948-05-15T23:11:37.413Z"),
}
],
"userId" : "junit-19568842",
}
Using a query like the one below tofind events generated in last 30 days:
db.events.find( { events : { $elemMatch: { "eventId" : 201,
"eventDate" : {$gt : new Date(1231657163876) } } } } ).explain()
Query plan shows that index on "events.eventDate" is used when the test data contains fewer events (around 20):
{
"cursor" : "BtreeCursor events.eventDate_1",
"nscanned" : 0,
"nscannedObjects" : 0,
"n" : 0,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : true,
"indexOnly" : false,
"indexBounds" : {
"events.eventDate" : [
[
ISODate("2009-01-11T06:59:23.876Z"),
ISODate("292278995-01--2147483647T07:12:56.808Z")
]
]
}
}
However, when there are large number of events (around 500), index is not used:
{
"cursor" : "BasicCursor",
"nscanned" : 4,
"nscannedObjects" : 4,
"n" : 0,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
}
}
Why is the index not being used when there are a lot of events? May be when there are large number of events, MongoDB finds it is efficient just to scan all the items than using the index?
In MongoDB, you can use the cursor. explain() method or the db. collection. explain() method to determine whether or not a query uses an index.
Each collection in MongoDB automatically has an index on the _id field. This index can then be used to fetch documents from the database efficiently. However, you will need to query data on other specific fields most of the time. This is where a single field index will come in handy.
Is it a good idea to have a index on the Timestamp column ? Yes, it is generally a good idea to have an index on a field used in a query criteria. This is really useful when there are a large number of documents (e.g., a million) in the collection. The index will be used to run the query fast.
Internally, Date objects are stored as a signed 64-bit integer representing the number of milliseconds since the Unix epoch (Jan 1, 1970). Not all database operations and drivers support the full 64-bit range. You may safely work with dates with years within the inclusive range 0 through 9999 .
MongoDB's query optimizer works in a special way. Rather than calculating cost of certain query plan, it just launches all available plans. Whichever returns first is considered optimal one and will be used in the future.
Application grows, data grows and changes, optimal plan may become not optimal at some point. So, mongo repeats that query selection process every once in a while.
It appears that in this concrete case, basic scan was the most efficient.
Link: http://www.mongodb.org/display/DOCS/Query+Optimizer
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With