In MongoDB, strategy for maximizing performance of writes to daily log documents

Tags:

We have a collection of log data, where each document in the collection is identified by a MAC address and a calendar day. Basically:

{
  _id: <generated>,
  mac: <string>,
  day: <date>,
  data: [ "value1", "value2" ]
}

Every five minutes, we append a new log entry to the data array within the current day's document. The document rolls over at midnight UTC when we create a new document for each MAC.

We've noticed that IO, as measured by bytes written, increases all day long, and then drops back down at midnight UTC. This shouldn't happen because the rate of log messages is constant. We believe that the unexpected behavior is due to Mongo moving documents, as opposed to updating their log arrays in place. For what it's worth, stats() shows that the paddingFactor is 1.0299999997858227.

Several questions:

Is there a way to confirm whether Mongo is updating in place or moving? We see some moves in the slow query log, but this seems like anecdotal evidence. I know I can db.setProfilingLevel(2), then db.system.profile.find(), and finally look for "moved:true", but I'm not sure whether it's ok to do this on a busy production system.
The size of each document is very predictable and regular. Assuming that mongo is doing a lot of moves, what's the best way to figure out why isn't Mongo able to presize more accurately? Or to make Mongo presize more accurately? Assuming that the above description of the problem is right, tweaking the padding factor does not seem like it would do the trick.
It should be easy enough for me to presize the document and remove any guesswork from Mongo. (I know the padding factor docs say that I shouldn't have to do this, but I just need to put this issue behind me.) What's the best way to presize a document? It seems simple to write a document with a garbage byte array field, and then immediately remove that field from the document, but are there any gotchas that I should be aware of? For example, I can imagine having to wait on the server for the write operation (i.e. do a safe write) before removing the garbage field.
I was concerned about preallocating all of a day's documents at around the same time because it seems like this would saturate the disk at that time. Is this a valid concern? Should I try to spread out the preallocation costs over the previous day?

375

asked Nov 04 '11 14:11

jtoberon

1 Answers

The following combination seems to cause write performance to fall off a cliff:

Journaling is on.
Writes append entries to an array that makes up the bulk of a larger document

Presumably I/O becomes saturated. Changing either of these factors seems to prevent this from happening:

Turn journaling off. Use more replicas instead.
Use smaller documents. Note that document size here is measured in bytes, not in the length of any arrays in the documents.
Journal on a separate filesystem.

In addition, here are some other tricks that improve write throughput. With the exception of sharding, we found the improvements to be incremental, whereas we were trying to solve a "this doesn't work at all" kind of problem, but I'm including them here in case you're looking for incremental improvements. The 10Gen folks did some testing and got similar results:

Shard.
Break up long arrays into several arrays, so that your overall structure looks more like a nested tree. If you use hour of the day as the key, then the daily log document becomes:
{"0":[...], "1":[...],...,"23":[...]}.
Try manual preallocation. (This didn't help us. Mongo's padding seems to work as advertised. My original question was misguided.)
Try different --syncdelay values. (This didn't help us.)
Try without safe writes. (We were already doing this for the log data, and it's not possible in many situations. Also, this seems like a bit of a cheat.)

You'll notice that I've copied some of the suggestions from 10Gen here, just for completeness. Hopefully I did so accurately! If they publish a cookbook example, then I'll post a link here.

answered Sep 18 '22 15:09

jtoberon

Related questions
                            
                                Mongoose optional search query parameters?
                            
                                "mpromise (mongoose's default promise library) is deprecated" error when testing [duplicate]
                            
                                mongoid .limit does not work in mongoid 3.1.x
                            
                                Determine empty template variable in Django
                            
                                Couldn't connect to server 127.0.0.1:27017 connection attempt failed MongoDB
                            
                                exception in initAndListen: NonExistentPath: Data directory /data/db not found., terminating
                            
                                What is difference between partial indexes and sparse indexes mongodb?
                            
                                mongoimport json file syntax
                            
                                Does applying a 2dsphere index on a mongoose schema force the location field to be required?
                            
                                Advantages of databases like Greenplum or Vertica compared to MongoDB or Cassandra [closed]
                            
                                MongoDB: How to store credentials safely?
                            
                                SignalR Core - Error: Websocket closed with status code: 1006
                            
                                Persistent login stopped working with Node.js, Express, PassportJS, Connect-Mongo
                            
                                Azure: DocumentDB Mongo $group is not supported
                            
                                Simple connection to mongodb in react app
                            
                                How do I unset all fields except a known set of fields?
                            
                                replica set config is invalid or we are not a member of it, running in kubernetes

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

In MongoDB, strategy for maximizing performance of writes to daily log documents

Tags:

io

mongodb

nosql

jtoberon

People also ask

1 Answers

jtoberon

Recent Activity

Donate For Us