Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MongoDB: Slow query, even with index

I have a webpage, which uses MongoDB for storing and retrieving various measurements. Suddenly, in some point, my webpage became so sluggish it became unusable. It turns out, my database is the culprit.

I searched for and have not found any solution for my problem, and I apologize, as I am pretty new to MongoDB and pulling my hair out at the moment.

Version of MongoDB I am using is 2.4.6, on VM Machine with 20GB RAM, which runs Ubuntu server 12.04. There is no replica or sharding set up.

Firstly, I set my profiling level to 2 and it revealed the slowest query:

db.system.profile.find().sort({"millis":-1}).limit(1).pretty()
{
        "op" : "query",
        "ns" : "station.measurement",
        "query" : {
                "$query" : {
                        "e" : {
                                "$gte" : 0
                        },
                        "id" : "180"
                },
                "$orderby" : {
                        "t" : -1
                }
        },
        "ntoreturn" : 1,
        "ntoskip" : 0,
        "nscanned" : 3295221,
        "keyUpdates" : 0,
        "numYield" : 6,
        "lockStats" : {
                "timeLockedMicros" : {
                        "r" : NumberLong(12184722),
                        "w" : NumberLong(0)
                },
                "timeAcquiringMicros" : {
                        "r" : NumberLong(5636351),
                        "w" : NumberLong(5)
                }
        },
        "nreturned" : 0,
        "responseLength" : 20,
        "millis" : 6549,
        "ts" : ISODate("2015-03-16T08:57:07.772Z"),
        "client" : "127.0.0.1",
        "allUsers" : [ ],
        "user" : ""
}

I ran that specific query with .explain() and looks like, it uses index as it should, but it takes too long. I also ran that same query on my another, drastically weaker server and sput out the results like a champ in a second.

> db.measurement.find({"id":"180", "e":{$gte:0}}).sort({"t":-1}).explain()
{
        "cursor" : "BtreeCursor id_1_t_-1_e_1",
        "isMultiKey" : false,
        "n" : 0,
        "nscannedObjects" : 0,
        "nscanned" : 660385,
        "nscannedObjectsAllPlans" : 1981098,
        "nscannedAllPlans" : 3301849,
        "scanAndOrder" : false,
        "indexOnly" : false,
        "nYields" : 7,
        "nChunkSkips" : 0,
        "millis" : 7243,
        "indexBounds" : {
                "id" : [
                        [
                                "180",
                                "180"
                        ]
                ],
                "t" : [
                        [
                                {
                                        "$maxElement" : 1
                                },
                                {
                                        "$minElement" : 1
                                }
                        ]
                ],
                "e" : [
                        [
                                0,
                                1.7976931348623157e+308
                        ]
                ]
        },
        "server" : "station:27017"
}

Next, I looked into indexes of measurement collection and it looked fine to me:

> db.measurement.getIndexes()
[
        {
                "v" : 1,
                "key" : {
                        "_id" : 1
                },
                "ns" : "station.measurement",
                "name" : "_id_"
        },
        {
                "v" : 1,
                "key" : {
                        "t" : 1
                },
                "ns" : "station.measurement",
                "name" : "t_1"
        },
        {
                "v" : 1,
                "key" : {
                        "id" : 1,
                        "d" : 1,
                        "_id" : -1
                },
                "ns" : "station.measurement",
                "name" : "id_1_d_1__id_-1"
        },
        {
                "v" : 1,
                "key" : {
                        "id" : 1,
                        "t" : -1,
                        "e" : 1
                },
                "ns" : "station.measurement",
                "name" : "id_1_t_-1_e_1"
        },
        {
                "v" : 1,
                "key" : {
                        "id" : 1,
                        "t" : -1,
                        "e" : -1
                },
                "ns" : "station.measurement",
                "name" : "id_1_t_-1_e_-1"
        }
]

Here is also the rest of information of my collection:

> db.measurement.stats()
{
        "ns" : "station.measurement",
        "count" : 157835456,
        "size" : 22377799512,
        "avgObjSize" : 141.77929395027692,
        "storageSize" : 26476834672,
        "numExtents" : 33,
        "nindexes" : 5,
        "lastExtentSize" : 2146426864,
        "paddingFactor" : 1.0000000000028617,
        "systemFlags" : 0,
        "userFlags" : 0,
        "totalIndexSize" : 30996614096,
        "indexSizes" : {
                "_id_" : 6104250656,
                "t_1" : 3971369360,
                "id_1_d_1__id_-1" : 8397896640,
                "id_1_t_-1_e_1" : 6261548720,
                "id_1_t_-1_e_-1" : 6261548720
        },
        "ok" : 1
}

I tried adding new index, repairing whole database, reindex. What am I doing wrong? I really appreciate any help as I desperately ran out of ideas.

UPDATE 1:

I added two indexes as suggested by Neil Lunn, some of the queries are a LOT faster:

{
                "v" : 1,
                "key" : {
                        "id" : 1,
                        "e" : 1,
                        "t" : -1
                },
                "ns" : "station.measurement",
                "name" : "id_1_e_1_t_-1",
                "background" : true
        },
        {
                "v" : 1,
                "key" : {
                        "id" : 1,
                        "e" : -1,
                        "t" : -1
                },
                "ns" : "station.measurement",
                "name" : "id_1_e_-1_t_-1",
                "background" : true
        }

Results I've got are interesting (not sure though they are relevant)

Next two queries differs by "id" only. Please notice, each query uses different index, why? Should I delete older ones?

> db.measurement.find({"id":"119", "e":{$gte:0}}).sort({"t":-1}).explain()
{
        "cursor" : "BtreeCursor id_1_t_-1_e_1",
        "isMultiKey" : false,
        "n" : 840747,
        "nscannedObjects" : 840747,
        "nscanned" : 1047044,
        "nscannedObjectsAllPlans" : 1056722,
        "nscannedAllPlans" : 1311344,
        "scanAndOrder" : false,
        "indexOnly" : false,
        "nYields" : 4,
        "nChunkSkips" : 0,
        "millis" : 3730,
        "indexBounds" : {
                "id" : [
                        [
                                "119",
                                "119"
                        ]
                ],
                "t" : [
                        [
                                {
                                        "$maxElement" : 1
                                },
                                {
                                        "$minElement" : 1
                                }
                        ]
                ],
                "e" : [
                        [
                                0,
                                1.7976931348623157e+308
                        ]
                ]
        },
        "server" : "station:27017"
}

> db.measurement.find({"id":"180", "e":{$gte:0}}).sort({"t":-1}).explain()
{
        "cursor" : "BtreeCursor id_1_e_1_t_-1",
        "isMultiKey" : false,
        "n" : 0,
        "nscannedObjects" : 0,
        "nscanned" : 0,
        "nscannedObjectsAllPlans" : 0,
        "nscannedAllPlans" : 45,
        "scanAndOrder" : true,
        "indexOnly" : false,
        "nYields" : 0,
        "nChunkSkips" : 0,
        "millis" : 0,
        "indexBounds" : {
                "id" : [
                        [
                                "180",
                                "180"
                        ]
                ],
                "e" : [
                        [
                                0,
                                1.7976931348623157e+308
                        ]
                ],
                "t" : [
                        [
                                {
                                        "$maxElement" : 1
                                },
                                {
                                        "$minElement" : 1
                                }
                        ]
                ]
        },
        "server" : "station:27017"
}

Could the problem be somewhere else? What could cause that sudden "sluggishness"? I have several other collections, where queries are suddenly slower also.

Oh, and another thing. On that other server I have, indexes are the same as here before I added new ones. Yes, collection is a bit smaller but it is several times faster.

like image 445
Denis Avatar asked Mar 17 '15 09:03

Denis


People also ask

How can indexes speed up queries in MongoDB?

Indexes store a small portion of each collection's data set into separate traversable data structures. These indexes then enable your queries to perform at faster speeds by minimizing the number of disk accesses required with each request.

Do indexes slow down inserts in MongoDB?

If you have too many indexes it will slow down your inserts because mongo needs to rebuild all your indexes.

Why is MongoDB query slow?

The slow queries can happen when you do not have proper DB indexes. Indexes support the efficient execution of queries in MongoDB. Without indexes, MongoDB must perform a collection scan, i.e. scan every document in a collection, to select those documents that match the query statement.

Are MongoDB indexes strongly consistent?

Indexes are strongly consistent with the underlying data. DynamoDB supports key-value queries only.


1 Answers

Then point here was in both the index and query ordering selections.

If you look at your earlier output from .explain() you will see that that there is a "min/max" range on the "t" element in your expression. By "moving that to the end" of the evaluation, you allow other filtering elements that are more important to the overall expression ( determine less possible matches of "e" to be the main factor before scanning though "t" in basically "everything".

It's a little bit DBA, but in the NoSQL world I do believe this becomes a programmer problem.

You essentially need to construct your "shortest match path" along the selected keys in order to get the most effective scan. That is why the altered results executes much faster.

like image 153
Neil Lunn Avatar answered Oct 01 '22 13:10

Neil Lunn