Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MongoDB: does document size affect query performance?

Assume a mobile game that is backed by a MongoDB database containing a User collection with several million documents.

Now assume several dozen properties that must be associated with the user - e.g. an array of _id values of Friend documents, their username, photo, an array of _id values of Game documents, last_login date, count of in-game currency, etc, etc, etc..

My concern is whether creating and updating large, growing arrays on many millions of User documents will add any 'weight' to each User document, and/or slowness to the overall system.

We will likely never eclipse 16mb per document, but we can safely say our documents will be 10-20x larger if we store these growing lists directly.

Question: is this even a problem in MongoDB? Does document size even matter if your queries are properly managed using projection and indexes, etc? Should we be actively pruning document size, e.g. with references to external lists vs. embedding lists of _id values directly?

In other words: if I want a user's last_login value, will a query that projects/selects only the last_login field be any different if my User documents are 100kb vs. 5mb?

Or: if I want to find all users with a specific last_login value, will document size affect that sort of query?

like image 641
toblerpwn Avatar asked May 23 '14 20:05

toblerpwn


People also ask

How large should MongoDB documents be?

Document Size Limit The maximum BSON document size is 16 megabytes. The maximum document size helps ensure that a single document cannot use excessive amount of RAM or, during transmission, excessive amount of bandwidth. To store documents larger than the maximum size, MongoDB provides the GridFS API.

What makes MongoDB slow?

The slow queries can happen when you do not have proper DB indexes. Indexes support the efficient execution of queries in MongoDB. Without indexes, MongoDB must perform a collection scan, i.e. scan every document in a collection, to select those documents that match the query statement.


2 Answers

One way to rephrase the question is to say, does a 1 million document query take longer if documents are 16mb vs 16kb each.

Correct me if I'm wrong, from my own experience, the smaller the document size, the faster the query.

I've done queries on 500k documents vs 25k documents and the 25k query was noticeably faster - ranging anywhere from a few milliseconds to 1-3 seconds faster. On production the time difference is about 2x-10x more.

The one aspect where document size comes into play is in query sorting, in which case, document size will affect whether the query itself will run or not. I've reached this limit numerous times trying to sort as little as 2k documents.

More references with some solutions here: https://docs.mongodb.org/manual/reference/limits/#operations https://docs.mongodb.org/manual/reference/operator/aggregation/sort/#sort-memory-limit

At the end of the day, its the end user that suffers.

When I attempt to remedy large queries causing unacceptably slow performance. I usually find myself creating a new collection with a subset of data, and using a lot of query conditions along with a sort and a limit.

Hope this helps!

like image 193
Richard Kuo Avatar answered Oct 11 '22 15:10

Richard Kuo


First of all you should spend a little time reading up on how MongoDB stores documents with reference to padding factors and powerof2sizes allocation:

http://docs.mongodb.org/manual/core/storage/ http://docs.mongodb.org/manual/reference/command/collStats/#collStats.paddingFactor

Put simply MongoDB tries to allocate some additional space when storing your original document to allow for growth. Powerof2sizes allocation became the default approach in version 2.6, where it will grow the document size in powers of 2.

Overall, performance will be much better if all updates fit within the original size allocation. The reason is that if they don't, the entire document needs to be moved someplace else with enough space, causing more reads and writes and in effect fragmenting your storage.

If your documents are really going to grow in size by a factor of 10X to 20X overtime that could mean multiple moves per document, which depending on your insert, update and read frequency could cause issues. If that is the case there are a couple of approaches you can consider:

1) Allocate enough space on initial insertion to cover most (let's say 90%) of normal documents lifetime growth. While this will be inefficient in space usage at the beginning, efficiency will increase with time as the documents grow without any performance reduction. In effect you will pay ahead of time for storage that you will eventually use later to get good performance over time.

2) Create "overflow" documents - let's say a typical 80-20 rule applies and 80% of your documents will fit in a certain size. Allocate for that amount and add an overflow collection that your document can point to if they have more than 100 friends or 100 Game documents for example. The overflow field points to a document in this new collection and your app only looks in the new collection if the overflow field exists. Allows for normal document processing for 80% of the users, and avoids wasting a lot of storage on the 80% of user documents that won't need it, at the expense of additional application complexity.

In either case I'd consider using covered queries by building the appropriate indexes:

A covered query is a query in which:

all the fields in the query are part of an index, and all the fields returned in the results are in the same index. 

Because the index “covers” the query, MongoDB can both match the query conditions and return the results using only the index; MongoDB does not need to look at the documents, only the index, to fulfill the query.

Querying only the index can be much faster than querying documents outside of the index. Index keys are typically smaller than the documents they catalog, and indexes are typically available in RAM or located sequentially on disk.

More on that approach here: http://docs.mongodb.org/manual/tutorial/create-indexes-to-support-queries/

like image 30
John Petrone Avatar answered Oct 11 '22 15:10

John Petrone