First of all: I already read a lot of post according to MongoDB query performance, but I didn't find any good solution. Inside the collection, the document structure looks like: <pre class="prettyprint lang-js prettyprint-override"><code>{ "_id" : ObjectId("535c4f1984af556ae798d629"), "point" : [ -4.372925494081455, 41.367710205649544 ], "location" : [ { "x" : -7.87297955453618, "y" : 73.3680160842939 }, { "x" : -5.87287143362673, "y" : 73.3674043270052 } ], "timestamp" : NumberLong("1781389600000") } </code></pre> My collection already has an index: <pre class="prettyprint lang-js prettyprint-override"><code>db.collection.ensureIndex({timestamp:-1}) </code></pre> Query looks like: <pre class="prettyprint lang-js prettyprint-override"><code>db.collection.find({ "timestamp" : { "$gte" : 1380520800000 , "$lte" : 1380546000000}}) </code></pre> Despite of this, the response time is too high, about 20 - 30 seconds (this time depends on the specified query params) Any help is useful! Thanks in advance. EDIT: I changed the find params, replacing these by real data. The above query takes 46 seconds, and this is the information given by explain() function: <pre class="prettyprint lang-js prettyprint-override"><code>{ "cursor" : "BtreeCursor timestamp_1", "isMultiKey" : false, "n" : 124494, "nscannedObjects" : 124494, "nscanned" : 124494, "nscannedObjectsAllPlans" : 124494, "nscannedAllPlans" : 124494, "scanAndOrder" : false, "indexOnly" : false, "nYields" : 45, "nChunkSkips" : 0, "millis" : 46338, "indexBounds" : { "timestamp" : [ [ 1380520800000, 1380558200000 ] ] }, "server" : "ip-XXXXXXXX:27017" } </code></pre>

The explain-output couldn't be more ideal. You found 124,494 documents via index (<code>nscanned</code>) and they all were valid results, so they all were returned (<code>n</code>). It still wasn't an index-only query, because the bounds weren't exact values found in specific documents. The reason why this query is a bit slow could be the huge amount of data it returned. All the documents you found must be read from hard-drive (when the collection is cold), scanned, serialized, sent to the client via network and deserialized by the client. Do you really need that much data for your use-case? When the answer is yes, does responsiveness really matter? I do not know what kind of application you actually want to create, but I am wildly guessing that yours is one of three use-cases: <ol> <li>You want to show all that data in form of some kind of report. That would mean the output would be a huge list the user has to scroll through. In that case I would recommend to use pagination. Only load as much data as fits on one screen and provide <code>next</code> and <code>previous</code> buttons. MongoDB pagination can be done with the cursor methods <code>.limit(n)</code> and <code>.skip(n)</code>.</li> <li>The above, but it is some kind of offline-report the user can download and then examine with all kinds of data-mining tools. In that case the initial load-time would be acceptable, because the user will spend some time with the data they received.</li> <li>You don't want to show all of that raw-data to the user but process it and present it in some kind of aggregated way, like a statistic or a diagram. In that case you could likely do all that work already on the database with the aggregation framework.</li> </ol>

MongoDB - Querying performance for over 10 million records

Tags:

performance

indexing

mongodb

nosql

First of all: I already read a lot of post according to MongoDB query performance, but I didn't find any good solution.

Inside the collection, the document structure looks like:

{
    "_id" : ObjectId("535c4f1984af556ae798d629"),
    "point" : [
        -4.372925494081455,
        41.367710205649544
    ],
    "location" : [
        {
            "x" : -7.87297955453618,
            "y" : 73.3680160842939
        },
        {
            "x" : -5.87287143362673,
            "y" : 73.3674043270052
        }
    ],
    "timestamp" : NumberLong("1781389600000")
}

My collection already has an index:

db.collection.ensureIndex({timestamp:-1})

Query looks like:

db.collection.find({ "timestamp" : { "$gte" : 1380520800000 , "$lte" : 1380546000000}})

Despite of this, the response time is too high, about 20 - 30 seconds (this time depends on the specified query params)

Any help is useful!

Thanks in advance.

EDIT: I changed the find params, replacing these by real data.

The above query takes 46 seconds, and this is the information given by explain() function:

{
    "cursor" : "BtreeCursor timestamp_1",
    "isMultiKey" : false,
    "n" : 124494,
    "nscannedObjects" : 124494,
    "nscanned" : 124494,
    "nscannedObjectsAllPlans" : 124494,
    "nscannedAllPlans" : 124494,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 45,
    "nChunkSkips" : 0,
    "millis" : 46338,
    "indexBounds" : {
        "timestamp" : [
            [
                1380520800000,
                1380558200000
            ]
        ]
    },
    "server" : "ip-XXXXXXXX:27017"
}

939

asked May 21 '14 09:05

nach0

1 Answers

The explain-output couldn't be more ideal. You found 124,494 documents via index (nscanned) and they all were valid results, so they all were returned (n). It still wasn't an index-only query, because the bounds weren't exact values found in specific documents.

The reason why this query is a bit slow could be the huge amount of data it returned. All the documents you found must be read from hard-drive (when the collection is cold), scanned, serialized, sent to the client via network and deserialized by the client.

Do you really need that much data for your use-case? When the answer is yes, does responsiveness really matter? I do not know what kind of application you actually want to create, but I am wildly guessing that yours is one of three use-cases:

You want to show all that data in form of some kind of report. That would mean the output would be a huge list the user has to scroll through. In that case I would recommend to use pagination. Only load as much data as fits on one screen and provide next and previous buttons. MongoDB pagination can be done with the cursor methods .limit(n) and .skip(n).
The above, but it is some kind of offline-report the user can download and then examine with all kinds of data-mining tools. In that case the initial load-time would be acceptable, because the user will spend some time with the data they received.
You don't want to show all of that raw-data to the user but process it and present it in some kind of aggregated way, like a statistic or a diagram. In that case you could likely do all that work already on the database with the aggregation framework.

answered Sep 24 '22 16:09

Philipp

Related questions
                            
                                Find in dictionary by value in Mongo
                            
                                Performing an OR of two Mongoid "any_in" queries
                            
                                restoring a dump with mongodb fails
                            
                                MongoDB multiple $and operators query in PHP
                            
                                Can't connect to mongodb
                            
                                Datastore solution for tag search
                            
                                Why doesn't MongoDb store my slashes in this string?
                            
                                spring-mongo-1.0.xsd error
                            
                                Inserting an java object using ReflectionDBObject class in mongodb?
                            
                                how long was a mongo instance running
                            
                                doctrine mongodb intallation error through composer
                            
                                Using MongoDB generated _ids as "secret data" (eg, OAuth Tokens)
                            
                                mongodb aggregation framework group by two fields
                            
                                Why do I get a pymongo.cursor.Cursor when trying to query my mongodb db via pymongo?
                            
                                Express cookieSession and Mongoose: how can I make request.session.user be a Mongoose model?
                            
                                MongoDB find() returns nothing
                            
                                MongoDB select distinct and where
                            
                                Mongodb aggregate, group and count instances
                            
                                node.js: Throwing error and it's not being caught
                            
                                MongoDB / Express - How to switch database after connecting via connect()

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With