Why does my MongoDB replica keep falling behind?

Tags:

While converting a collection to a capped collection last night, my secondary's optime started lagging behind the primary. It advanced slowly, a couple seconds every several minutes, and eventually fell out of the primary's oplog window. Per the instructions here I stopped mongod on the secondary, deleted all data files, and restarted it, though I forgot to lock the primary from writes. Secondary went through its initialization phase, which took a solid amount of time, and was finally back in business, but when I logged in, replication was now even farther behind.

Since this is the cloud, after all, I created an image of my primary (which is supposed to copy all data), though I couldn't run db.fsyncLock() at the time, because it was taking some writes. The new image finishes, and I launch a new server based on that image, get it added to my replica set, remove the old secondary, and life is good, right? Not quite - the new secondary is lagging about an hour behind, and through the course of the day (and tonight) eventually gets to the point where it's 14 hours behind (though strangely enough still within the oplog window).

I take the next step from the "resyncing a stale member page". Shutdown mongod on both servers, gzip and copy my data folder from primary to secondary, unzip, fire them both up, db.fsyncLock() my primary. What blows my mind is that even with THE SAME DATA, after initializing, my secondary says it's 1 hour behind. I add it back into the replica set, and it catches up quickly to being 5 minutes behind.

All good, right? No - flash forward, secondary is advancing sloooooowly, and is now 20 minutes behind. Mongostat has secondary pegged at 95+ locked %, iostat -xm 2 doesn't show anything crazy - primary is currently idle from not taking writes, secondary is definitely not doing much at all (.04 wMB/second). Not sure if it's worth mentioning, but primary currently feels ~~dog slow~~unresponsive logging into the mongo shell, etc.

What gives, Mongo? Why can't you just catch up? What am I doing wrong in trying to get my secondary caught up?

EDIT Answering questions:

Version: 2.0.4
Hardware: Both nodes are the same hardware, near as I can tell - 8GB RAM, quad core CPU. I assume it's something virtualized.
Write rate: it varies. As mentioned, last night I was converting to a capped collection, which triggered the whole thing. Overnight, there was a process writing around a couple hundred small documents (~155 bytes each) a few times an hour, so max I'd estimate around 100-200kbytes/hour. During the day, processing was more intense, updating hundreds of thousands of 500-byte documents, and writing a couple hundred thousand more. Still not talking about enormous data quantities. EDIT found some iostat output from earlier today:

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvda              1.00  2564.50  243.50  282.50  8986.00 11388.00    77.47    11.32   21.46    2.36   37.93   0.50  26.50

That one was particularly bursty at 11 wMB/s, saw util% hit 34% with 7 wMB/s, and 72% at 52 rMB/s. So not saturated, but definitely a read-heavy workload in the morning. It is interesting though that despite having obj. size ~5GB, and ~1GB indexes (see below), there's so much disk activity. Shouldn't that all be in RAM?

Working set: I still haven't found the accepted methodology for calculating working set, but if it helps:

    "collections" : 21,
    "objects" : 15540092,
    "avgObjSize" : 325.26198326238995,
    "dataSize" : 5054601144,
    "storageSize" : 5874327552,
    "numExtents" : 132,
    "indexes" : 43,
    "indexSize" : 864366720,
    "fileSize" : 10666115072,
    "nsSizeMB" : 16,
    "ok" : 1

I can't imagine that that's overwhelming 8GB of RAM, though I could be wrong.

some recent mongostat samples from secondary:

insert  query update delete getmore command flushes mapped  vsize    res faults locked % idx miss %     qr|qw   ar|aw  netIn netOut  conn    set repl       time 
    *0     *0     *0     *0       0     1|0       0  22.2g  44.9g   912m      0     99.2          0       0|0     0|1     2k   303b   151 mySet  SEC   03:47:54 
    *0     *0     *0     *0       0     1|0       0  22.2g  44.9g  1.85g      0      101          0       0|0     0|1     3k   303b   151 mySet  SEC   03:48:04

EDIT

Tried more things. I shutdown the primary (now called A, secondary will be B), deleted its data, and unzipped its snapshot (now a couple hours old, but at this point, we're not writing anything new). Started A up with --fastsync, and it's still like 45 seconds behind the B's (now primary) optime, which had been hanging out at about 02:19:52UTC. Finally about an hour later, A catches up, so I call rs.stepDown() on B. Instantly, rs.status() shows me that both servers have optimes around 04:08 UTC, but B (now secondary) is again lagging by 17 seconds...then 30 seconds...now 7 minutes...

EDIT

A few minutes after taking @matulef's suggestion and re-creating indexes on my capped collections, as well as re-starting the secondary's mongod process, its optime has only increased a few seconds. Secondary locked % from mongostat is still hovering from 95-104%, and interestingly enough, res size was swinging pretty wildly from 100M to 2GB and back again before settling around 1GB.

EDIT (the next evening)

Conclusion to the story - @matulef was on the right track, I should have been more careful about converting a replicated collection to a capped collection. What follows is what happened, though I don't advertise this as being data safe - I freely acknowledge I may have lost some data in this process, so YMMV.

Creating the indexes for the capped collections on the primary (A) didn't propagate to the secondary (B), and A happened to fail over (not intentionally). Once B was primary, I manually created the indexes on the capped collections there, and the resync operation to bring A in line with B started moving quickly. Unfortunately for me, my oplog windows no longer lined up, so I did end up having to snapshot data from B to A. Once I restarted mongo with the same set of data, A & B were happy again, and replication has been back in sync ever since.

200

asked Jul 11 '12 02:07

awshepard

1 Answers

The problem here is that capped collections do not have an _id index by default (and the "convertToCapped" command actually drops all indexes for that collection). This is an issue because the secondaries perform updates by applying ops from the oplog, which refer to documents by their _ids. If you are missing an _id index, each update requires a full table scan on the secondaries, causing them to lag far behind.

The solution is to create an _id index on the capped collection. However, if you create the index on the primary, but your secondaries are already lagging behind, they won't receive the index creation operation quickly enough. Instead, the best way to fix things is to first fix up each lagging secondary, one-by-one. For each one, shut it down and restart in standalone mode (on a different port, without the --replSet option), build the _id index, then add it back into the set. Finally, once the secondaries are fixed up, you can step down the primary and repeat the process with it too.

Update: In mongoDB 2.0.x and earlier, capped collections do not have an _id index by default. However, the default behavior is scheduled to change in mongoDB 2.2, so that capped collections created in 2.2+ will have an _id index created automatically, just as with non-capped collections. For capped collections created prior to 2.2, you will still need to manually create an _id index using the steps outlined above, but new collections should not suffer from the problems seen above.

answered Oct 01 '22 01:10

matulef

Related questions
                            
                                MongoDB - slow '$group' performance
                            
                                mongodb : could not find member to sync from
                            
                                How to use MongoDB with ColdFusion
                            
                                delete multiple documents at once mongodb atlas
                            
                                Java spring data mongodb: UUID as entity ID in mongodb
                            
                                Why are many sessions faster in MongoDB than one?
                            
                                Sync MongoDB with ElasticSearch
                            
                                What does "v" mean in mongo index
                            
                                Using $or clause in pymongo
                            
                                Mongoengine: How to append a new document to an Embedded ListField document?
                            
                                Mongodb real basic use case
                            
                                MongoDB - Query conundrum - Document refs or subdocument
                            
                                Why MongoDB doesn't use Index Intersection?
                            
                                Mongo engine query the referencefield
                            
                                Any way to get the modified IDS from mongodb's bulk operations using mongoose?
                            
                                Mongoose update not persisting
                            
                                Raspberry Pi: MongoError: Server at localhost:27017 reports wire version 0, but this version of Node.js Driver requires at least 2 (MongoDB2.6)
                            
                                MongoDB queries are taking 2-3 seconds from Node.js app on Heroku
                            
                                Unrecognized field: 'ntoreturn' error in documenDB
                            
                                id autoincrement/sequence emulation with CassandraDB/MongoDB etc

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why does my MongoDB replica keep falling behind?

Tags:

mongodb

replication

awshepard

People also ask

1 Answers

matulef

Recent Activity

Donate For Us