Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MongoDB - Querying performance for over 10 million records

First of all: I already read a lot of post according to MongoDB query performance, but I didn't find any good solution.

Inside the collection, the document structure looks like:

{
    "_id" : ObjectId("535c4f1984af556ae798d629"),
    "point" : [
        -4.372925494081455,
        41.367710205649544
    ],
    "location" : [
        {
            "x" : -7.87297955453618,
            "y" : 73.3680160842939
        },
        {
            "x" : -5.87287143362673,
            "y" : 73.3674043270052
        }
    ],
    "timestamp" : NumberLong("1781389600000")
}

My collection already has an index:

db.collection.ensureIndex({timestamp:-1})

Query looks like:

db.collection.find({ "timestamp" : { "$gte" : 1380520800000 , "$lte" : 1380546000000}})

Despite of this, the response time is too high, about 20 - 30 seconds (this time depends on the specified query params)

Any help is useful!

Thanks in advance.

EDIT: I changed the find params, replacing these by real data.

The above query takes 46 seconds, and this is the information given by explain() function:

{
    "cursor" : "BtreeCursor timestamp_1",
    "isMultiKey" : false,
    "n" : 124494,
    "nscannedObjects" : 124494,
    "nscanned" : 124494,
    "nscannedObjectsAllPlans" : 124494,
    "nscannedAllPlans" : 124494,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 45,
    "nChunkSkips" : 0,
    "millis" : 46338,
    "indexBounds" : {
        "timestamp" : [
            [
                1380520800000,
                1380558200000
            ]
        ]
    },
    "server" : "ip-XXXXXXXX:27017"
}
like image 939
nach0 Avatar asked May 21 '14 09:05

nach0


People also ask

Can MongoDB handle millions of records?

Working with MongoDB and ElasticSearch is an accurate decision to process millions of records in real-time. These structures and concepts could be applied to larger datasets and will work extremely well too.

Is MongoDB fast for big data?

It can process large amounts of real-time data very quickly because of in-memory calculations. MongoDB: MongoDB is a NoSQL database. It has a flexible schema. MongoDB stores huge amounts of data in a naturally traversable format, making it a good choice to store, query, and analyze big data.

How big is too big for MongoDB?

The maximum BSON document size is 16 megabytes. The maximum document size helps ensure that a single document cannot use excessive amount of RAM or, during transmission, excessive amount of bandwidth. To store documents larger than the maximum size, MongoDB provides the GridFS API.


1 Answers

The explain-output couldn't be more ideal. You found 124,494 documents via index (nscanned) and they all were valid results, so they all were returned (n). It still wasn't an index-only query, because the bounds weren't exact values found in specific documents.

The reason why this query is a bit slow could be the huge amount of data it returned. All the documents you found must be read from hard-drive (when the collection is cold), scanned, serialized, sent to the client via network and deserialized by the client.

Do you really need that much data for your use-case? When the answer is yes, does responsiveness really matter? I do not know what kind of application you actually want to create, but I am wildly guessing that yours is one of three use-cases:

  1. You want to show all that data in form of some kind of report. That would mean the output would be a huge list the user has to scroll through. In that case I would recommend to use pagination. Only load as much data as fits on one screen and provide next and previous buttons. MongoDB pagination can be done with the cursor methods .limit(n) and .skip(n).
  2. The above, but it is some kind of offline-report the user can download and then examine with all kinds of data-mining tools. In that case the initial load-time would be acceptable, because the user will spend some time with the data they received.
  3. You don't want to show all of that raw-data to the user but process it and present it in some kind of aggregated way, like a statistic or a diagram. In that case you could likely do all that work already on the database with the aggregation framework.
like image 89
Philipp Avatar answered Sep 24 '22 16:09

Philipp