Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Should I increase the size of my MongoDB oplog file?

Tags:

mongodb

I understand that the oplog file will split multi updates into individual updates but what about batch inserts? Are those also split into individual inserts?

If I have a write intensive collection with batches of ~20K docs being inserted roughly every 30 seconds, do I / should I consider increasing my oplog size beyond the default? I have a 3 member replica set and mongod is running on a 64 bit Ubuntu server install with the Mongodb data sitting on a 100GB volume.

Here is some data which may or may not be helpful:

    gs_rset:PRIMARY> db.getReplicationInfo()
    {
        "logSizeMB" : 4591.3134765625,
        "usedMB" : 3434.63,
        "timeDiff" : 68064,
        "timeDiffHours" : 18.91,
        "tFirst" : "Wed Oct 24 2012 22:35:10 GMT+0000 (UTC)",
        "tLast" : "Thu Oct 25 2012 17:29:34 GMT+0000 (UTC)",
        "now" : "Fri Oct 26 2012 19:42:19 GMT+0000 (UTC)"
    }
    gs_rset:PRIMARY> rs.status()
    {
        "set" : "gs_rset",
        "date" : ISODate("2012-10-26T19:44:00Z"),
        "myState" : 1,
        "members" : [
            {
                "_id" : 0,
                "name" : "xxxx:27017",
                "health" : 1,
                "state" : 1,
                "stateStr" : "PRIMARY",
                "uptime" : 77531,
                "optime" : Timestamp(1351186174000, 1470),
                "optimeDate" : ISODate("2012-10-25T17:29:34Z"),
                "self" : true
            },
            {
                "_id" : 1,
                "name" : "xxxx:27017",
                "health" : 1,
                "state" : 2,
                "stateStr" : "SECONDARY",
                "uptime" : 76112,
                "optime" : Timestamp(1351186174000, 1470),
                "optimeDate" : ISODate("2012-10-25T17:29:34Z"),
                "lastHeartbeat" : ISODate("2012-10-26T19:44:00Z"),
                "pingMs" : 1
            },
            {
                "_id" : 2,
                "name" : "xxxx:27017",
                "health" : 1,
                "state" : 2,
                "stateStr" : "SECONDARY",
                "uptime" : 61301,
                "optime" : Timestamp(1351186174000, 1470),
                "optimeDate" : ISODate("2012-10-25T17:29:34Z"),
                "lastHeartbeat" : ISODate("2012-10-26T19:43:59Z"),
                "pingMs" : 1
            }
        ],
        "ok" : 1
    }

gs_rset:PRIMARY> db.printCollectionStats()
dev_fbinsights
{
    "ns" : "dev_stats.dev_fbinsights",
    "count" : 6556181,
    "size" : 3117699832,
    "avgObjSize" : 475.53596095043747,
    "storageSize" : 3918532608,
    "numExtents" : 22,
    "nindexes" : 2,
    "lastExtentSize" : 1021419520,
    "paddingFactor" : 1,
    "systemFlags" : 0,
    "userFlags" : 0,
    "totalIndexSize" : 1150346848,
    "indexSizes" : {
        "_id_" : 212723168,
        "fbfanpage_id_1_date_1_data.id_1" : 937623680
    },
    "ok" : 1
}
like image 902
Brian Hosie Avatar asked Oct 26 '12 19:10

Brian Hosie


1 Answers

The larger the size of the current primary's oplog, the longer the window of time a replica set member will be able to remain offline without falling too far behind the primary. If it does fall too far behind, it will need a full resync.

The field timeDiffHours as returned by db.getReplicationInfo() reports how many hours worth of data the oplog currently has recorded. After the oplog has filled up and starts overwriting old entries, then start to monitor this value. Do so especially under heavy write load (in which the value will decrease). If you then assume it will never drop below N hours, then N is the maximum number of hours that you can tolerate a replica set member being temporarily offline (e.g. for regular maintenance, or to make an offline backup, or in the event of hardware failure) without performing the full resync. The member would then be able to automatically catch up to the primary after coming back online.

If you're not comfortable with how low N is, then you should increase the size of the oplog. It completely depends on long your maintenance windows are, or how quickly you or your ops team can respond to disaster scenarios. Be liberal in how much disk space you allocate for it, unless you have a compelling need for that space.

I'm assuming here that you're keeping the size of the oplog constant over all replica set members, which is a reasonable thing to do. If not, then plan for the scenario where the replica set member with the smallest oplog gets elected primary.

(To answer your other question: similarly to multi-updates, batch inserts are also fanned out into multiple operations in the oplog)

Edit: Note that data imports and bulk inserts/updates will write data significantly faster to the oplog than your application might at a typical heavy load. To reiterate: be conservative in your estimation for how much time it will take for the oplog to fill.

like image 185
J Rassi Avatar answered Oct 29 '22 08:10

J Rassi