While converting a collection to a capped collection last night, my secondary's optime started lagging behind the primary. It advanced slowly, a couple seconds every several minutes, and eventually fell out of the primary's oplog window. Per the instructions here I stopped mongod on the secondary, deleted all data files, and restarted it, though I forgot to lock the primary from writes. Secondary went through its initialization phase, which took a solid amount of time, and was finally back in business, but when I logged in, replication was now even farther behind.
Since this is the cloud, after all, I created an image of my primary (which is supposed to copy all data), though I couldn't run db.fsyncLock() at the time, because it was taking some writes. The new image finishes, and I launch a new server based on that image, get it added to my replica set, remove the old secondary, and life is good, right? Not quite - the new secondary is lagging about an hour behind, and through the course of the day (and tonight) eventually gets to the point where it's 14 hours behind (though strangely enough still within the oplog window).
I take the next step from the "resyncing a stale member page". Shutdown mongod on both servers, gzip and copy my data folder from primary to secondary, unzip, fire them both up, db.fsyncLock() my primary. What blows my mind is that even with THE SAME DATA, after initializing, my secondary says it's 1 hour behind. I add it back into the replica set, and it catches up quickly to being 5 minutes behind.
All good, right? No - flash forward, secondary is advancing sloooooowly, and is now 20 minutes behind. Mongostat has secondary pegged at 95+ locked %, iostat -xm 2 doesn't show anything crazy - primary is currently idle from not taking writes, secondary is definitely not doing much at all (.04 wMB/second). Not sure if it's worth mentioning, but primary currently feels dog slowunresponsive logging into the mongo shell, etc.
What gives, Mongo? Why can't you just catch up? What am I doing wrong in trying to get my secondary caught up?
EDIT Answering questions:
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util xvda 1.00 2564.50 243.50 282.50 8986.00 11388.00 77.47 11.32 21.46 2.36 37.93 0.50 26.50
That one was particularly bursty at 11 wMB/s, saw util% hit 34% with 7 wMB/s, and 72% at 52 rMB/s. So not saturated, but definitely a read-heavy workload in the morning. It is interesting though that despite having obj. size ~5GB, and ~1GB indexes (see below), there's so much disk activity. Shouldn't that all be in RAM?
"collections" : 21, "objects" : 15540092, "avgObjSize" : 325.26198326238995, "dataSize" : 5054601144, "storageSize" : 5874327552, "numExtents" : 132, "indexes" : 43, "indexSize" : 864366720, "fileSize" : 10666115072, "nsSizeMB" : 16, "ok" : 1
I can't imagine that that's overwhelming 8GB of RAM, though I could be wrong.
insert query update delete getmore command flushes mapped vsize res faults locked % idx miss % qr|qw ar|aw netIn netOut conn set repl time *0 *0 *0 *0 0 1|0 0 22.2g 44.9g 912m 0 99.2 0 0|0 0|1 2k 303b 151 mySet SEC 03:47:54 *0 *0 *0 *0 0 1|0 0 22.2g 44.9g 1.85g 0 101 0 0|0 0|1 3k 303b 151 mySet SEC 03:48:04
EDIT
Tried more things. I shutdown the primary (now called A, secondary will be B), deleted its data, and unzipped its snapshot (now a couple hours old, but at this point, we're not writing anything new). Started A up with --fastsync, and it's still like 45 seconds behind the B's (now primary) optime, which had been hanging out at about 02:19:52UTC. Finally about an hour later, A catches up, so I call rs.stepDown() on B. Instantly, rs.status() shows me that both servers have optimes around 04:08 UTC, but B (now secondary) is again lagging by 17 seconds...then 30 seconds...now 7 minutes...
EDIT
A few minutes after taking @matulef's suggestion and re-creating indexes on my capped collections, as well as re-starting the secondary's mongod process, its optime has only increased a few seconds. Secondary locked % from mongostat is still hovering from 95-104%, and interestingly enough, res size was swinging pretty wildly from 100M to 2GB and back again before settling around 1GB.
EDIT (the next evening)
Conclusion to the story - @matulef was on the right track, I should have been more careful about converting a replicated collection to a capped collection. What follows is what happened, though I don't advertise this as being data safe - I freely acknowledge I may have lost some data in this process, so YMMV.
Creating the indexes for the capped collections on the primary (A) didn't propagate to the secondary (B), and A happened to fail over (not intentionally). Once B was primary, I manually created the indexes on the capped collections there, and the resync operation to bring A in line with B started moving quickly. Unfortunately for me, my oplog windows no longer lined up, so I did end up having to snapshot data from B to A. Once I restarted mongo with the same set of data, A & B were happy again, and replication has been back in sync ever since.
Why are You Experiencing MongoDB Replication Lag? MongoDB replication lag occurs when the secondary node cannot replicate data fast enough to keep up with the rate that data is being written to the primary node.
What kind of replication does MongoDB support? MongoDB supports replica sets, which can have up to 50 nodes.
MongoDB achieves replication by the use of replica set. A replica set is a group of mongod instances that host the same data set. In a replica, one node is primary node that receives all write operations. All other instances, such as secondaries, apply operations from the primary so that they have the same data set.
The problem here is that capped collections do not have an _id index by default (and the "convertToCapped" command actually drops all indexes for that collection). This is an issue because the secondaries perform updates by applying ops from the oplog, which refer to documents by their _ids. If you are missing an _id index, each update requires a full table scan on the secondaries, causing them to lag far behind.
The solution is to create an _id index on the capped collection. However, if you create the index on the primary, but your secondaries are already lagging behind, they won't receive the index creation operation quickly enough. Instead, the best way to fix things is to first fix up each lagging secondary, one-by-one. For each one, shut it down and restart in standalone mode (on a different port, without the --replSet option), build the _id index, then add it back into the set. Finally, once the secondaries are fixed up, you can step down the primary and repeat the process with it too.
Update: In mongoDB 2.0.x and earlier, capped collections do not have an _id index by default. However, the default behavior is scheduled to change in mongoDB 2.2, so that capped collections created in 2.2+ will have an _id index created automatically, just as with non-capped collections. For capped collections created prior to 2.2, you will still need to manually create an _id index using the steps outlined above, but new collections should not suffer from the problems seen above.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With