We have a collection of log data, where each document in the collection is identified by a MAC address and a calendar day. Basically:
{
_id: <generated>,
mac: <string>,
day: <date>,
data: [ "value1", "value2" ]
}
Every five minutes, we append a new log entry to the data array within the current day's document. The document rolls over at midnight UTC when we create a new document for each MAC.
We've noticed that IO, as measured by bytes written, increases all day long, and then drops back down at midnight UTC. This shouldn't happen because the rate of log messages is constant. We believe that the unexpected behavior is due to Mongo moving documents, as opposed to updating their log arrays in place. For what it's worth, stats()
shows that the paddingFactor is 1.0299999997858227.
Several questions:
db.setProfilingLevel(2)
, then db.system.profile.find()
, and finally look for "moved:true"
, but I'm not sure whether it's ok to do this on a busy production system.Other ways to improve MongoDB performance after identifying your major query patterns include: Storing the results of frequent sub-queries on documents to reduce read load. Making sure that you have indices on any fields you regularly query against. Looking at your logs to identify slow queries, then check your indices.
Use limit() to maximize performance and prevent MongoDB from returning more results than required for processing.
Add Appropriate Indexes NoSQL databases require indexes, just like their relational cousins. An index is built from a set of one or more fields to make querying fast. For example, you could index the country field in a user collection.
The maximum size an individual document can be in MongoDB is 16MB with a nested depth of 100 levels. Edit: There is no max size for an individual MongoDB database.
The following combination seems to cause write performance to fall off a cliff:
Presumably I/O becomes saturated. Changing either of these factors seems to prevent this from happening:
In addition, here are some other tricks that improve write throughput. With the exception of sharding, we found the improvements to be incremental, whereas we were trying to solve a "this doesn't work at all" kind of problem, but I'm including them here in case you're looking for incremental improvements. The 10Gen folks did some testing and got similar results:
{"0":[...], "1":[...],...,"23":[...]}
.You'll notice that I've copied some of the suggestions from 10Gen here, just for completeness. Hopefully I did so accurately! If they publish a cookbook example, then I'll post a link here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With