Every one or two minutes, the mongo clients halt for about 3 seconds. Normal operation time for updates is about 1 or 2 milliseconds. When this slowness appear, we have a bunch of updates lasting 1 to 3 seconds.
The slow queries log does not show anything related to this. Neither does debugging the mongo client (mongo-php-client).
The current architecture has 1 master, 1 slave and one Arbiter in the replica set.
The queries executed are always the same sort (upsert by _id, insert with new MongoId). There is no "every few minutes we run this super-expensive update"
The blocking seems to be caused by local.oplog.rs. At least, that is what the mongotop output shows below. I haven't found any indication that the secondary is causing this issue, as the outputs of all the following commands seem to be stable there. I haven't found any information pointing that the slowness is caused by a specific query either.
The idea behind the way we store data is pre-aggregated reports. We have a lot of updates (couple of hundreds per second), but a very low rate of queries. The indexes are date bound (except for _id, which is calculated from the composite key based on all the dimensions the record contains. Whit this, i mean that the _id is not incremental, as it would be with an ObjectId index). Just to give an idea, the indexes in the biggest collection are (in Mb):
"indexSizes" : {
"_id_" : 1967,
"owner_date" : 230,
"flow_date" : 231,
"date" : 170
},
Most of the other collections have indexes of the size of 100Mb or less. In all the collections, the _id index is the biggest one. Is worth noting that this ids are generated manually (based on the metadata so the insertion is done as an upsert, and are not incremental)
Edit1: After more digging, it seems that the locking is related with the rsync process of the journal, when flushing to disk. The journal is in the same filesystem than the data files, but the slowness in not expected as the disks are fast ssd devices.
Edit2: After some testings, the writing capability of the disks is not an issue. We are usually writing at a rate of several megabytes per second. Running some tests on on the disk it accepts 150mb/s without problem
Why is this downtime happening?
The following commands are run in the primary node
Every time the slowness appears, we see the following in the mongostat (two examples)
insert query update delete getmore command % dirty % used flushes vsize res qr|qw ar|aw netIn netOut conn set repl time
10 *0 141 *0 93 120|0 0.4 80.0 0 11.5G 9.4G 0|1 1|0 110k 71k 274 rs0 PRI 13:15:44
12 *0 178 *0 72 105|0 0.2 80.0 1 11.5G 9.4G 0|0 1|0 111k 79k 274 rs0 PRI 13:15:45
47 *0 7 *0 0 159|0 0.1 80.0 0 11.5G 9.4G 0|0 2|1 15k 44k 274 rs0 PRI 13:15:49 !!!!HERE
14 *0 929 *0 99 170|0 0.2 80.0 0 11.5G 9.4G 0|0 1|0 419k 417k 274 rs0 PRI 13:15:50
21 *0 287 *0 124 181|0 0.2 80.0 0 11.5G 9.4G 0|0 1|0 187k 122k 274 rs0 PRI 13:15:51
insert query update delete getmore command % dirty % used flushes vsize res qr|qw ar|aw netIn netOut conn set repl time
10 *0 145 *0 70 108|0 0.3 79.9 0 11.5G 9.4G 0|0 1|0 98k 71k 274 rs0 PRI 13:16:48
11 *0 155 *0 72 111|0 0.2 79.9 1 11.5G 9.4G 0|0 1|0 103k 75k 274 rs0 PRI 13:16:49
44 *0 3 *0 0 144|0 0.0 79.9 0 11.5G 9.4G 0|2 1|0 11k 75k 274 rs0 PRI 13:16:53 !!!!HERE
11 *0 837 *0 94 134|0 0.2 79.9 0 11.5G 9.4G 0|0 1|0 377k 348k 274 rs0 PRI 13:16:54
12 *0 180 *0 86 139|0 0.2 79.9 0 11.5G 9.4G 0|0 1|0 122k 85k 274 rs0 PRI 13:16:55
14 *0 195 *0 83 124|0 0.2 79.9 0 11.5G 9.4G 0|0 2|0 125k 89k 274 rs0 PRI 13:16:56
the update column has a drop, and the following one much more updates. Note also that we are using mongostat with 1 second delay. When the slowness appears, the mongostat stops replying for few seconds. The stop is only present in the master, not in the slave server
This is the output of mongotop when this problem happens (at 2015-07-07T13:29:38):
(an example with a bit more of context can be found here)
ns total read write 2015-07-07T13:29:33+02:00
database_name.d_date_flow_owner 555ms 550ms 4ms
local.oplog.rs 61ms 53ms 7ms
database_name.client_context_bbbbbbb 15ms 0ms 15ms
database_name.d_date_landing_owner 15ms 0ms 15ms
database_name.d_date_billing_owner 10ms 0ms 10ms
database_name.w_bl_carrier_country_date_device_flow_landing_manager_op1_os_owner_prod_site 7ms 0ms 7ms
database_name.d_carrier_country_date_device_flow_landing_op1_os_owner_site 5ms 0ms 5ms
database_name.d_country_date_owner 5ms 0ms 5ms
database_name.d_date_device_owner 5ms 0ms 5ms
database_name.d_date_os_owner 5ms 0ms 5ms
ns total read write 2015-07-07T13:29:37+02:00
database_name.client_context_bbbbbbb 2ms 0ms 2ms
database_name.client_context_aaaaaaaaa 1ms 0ms 1ms
admin.system.backup_users 0ms 0ms 0ms
admin.system.namespaces 0ms 0ms 0ms
admin.system.new_users 0ms 0ms 0ms
admin.system.profile 0ms 0ms 0ms
admin.system.roles 0ms 0ms 0ms
admin.system.users 0ms 0ms 0ms
admin.system.version 0ms 0ms 0ms
database_name 0ms 0ms 0ms
ns total read write 2015-07-07T13:29:38+02:00
local.oplog.rs 8171ms 4470ms 3701ms
database_name.d_date_op1_owner 45ms 0ms 45ms
database_name.d_date_device_owner 39ms 0ms 39ms
database_name.m_date_owner 34ms 0ms 34ms
database_name.d_date_owner 32ms 0ms 32ms
database_name.d_date_owner_site 31ms 0ms 31ms
database_name.d_carrier_date_owner 30ms 0ms 30ms
database_name.d_date_flow_owner 30ms 0ms 30ms
database_name.d_date_owner_product 28ms 0ms 28ms
database_name.d_carrier_country_date_device_flow_landing_op1_os_owner_site 27ms 0ms 27ms
ns total read write 2015-07-07T13:29:39+02:00
database_name.d_date_flow_owner 558ms 552ms 6ms
local.oplog.rs 62ms 61ms 1ms
database_name.d_carrier_date_owner 17ms 0ms 17ms
database_name.d_date_owner 16ms 0ms 16ms
database_name.w_bl_carrier_country_date_device_flow_landing_manager_op1_os_owner_prod_site 7ms 0ms 7ms
database_name.d_date_billing_owner 6ms 0ms 6ms
database_name.d_carrier_country_date_device_flow_landing_op1_os_owner_site 5ms 0ms 5ms
database_name.d_country_date_owner 5ms 0ms 5ms
database_name.d_date_device_owner 5ms 0ms 5ms
database_name.d_date_op1_owner 5ms 0ms 5ms
Debug of the php mongo client that shows the issue (the last two "PhpMongoClient debug" lines):
(an example with a bit more of context can be found here)
Update duration: 2ms
Update duration: 1ms
Update duration: 1ms
Update duration: 4006ms
PhpMongoClient debug: 2015-07-07 10:40:26 - PARSE (INFO): Parsing mongodb://primary_host.lan,secondary_host.lan
PhpMongoClient debug: 2015-07-07 10:40:26 - PARSE (INFO): - Found node: primary_host.lan:27017
[....]
PhpMongoClient debug: 2015-07-07 10:40:26 - REPLSET (FINE): limiting by credentials: done
PhpMongoClient debug: 2015-07-07 10:40:26 - REPLSET (FINE): sorting servers by priority and ping time
PhpMongoClient debug: 2015-07-07 10:40:26 - REPLSET (FINE): - connection: type: PRIMARY, socket: 42, ping: 0, hash: primary_host.lan:27017;rs0;database_name/user/5ca571e7db198eeee3abee35857bfd53;30751
PhpMongoClient debug: 2015-07-07 10:40:26 - REPLSET (FINE): sorting servers: done
PhpMongoClient debug: 2015-07-07 10:40:26 - REPLSET (FINE): selecting near servers
PhpMongoClient debug: 2015-07-07 10:40:26 - REPLSET (FINE): selecting near servers: nearest is 0ms
PhpMongoClient debug: 2015-07-07 10:40:26 - REPLSET (FINE): - connection: type: PRIMARY, socket: 42, ping: 0, hash: primary_host.lan:27017;rs0;database_name/user/5ca571e7db198eeee3abee35857bfd53;30751
PhpMongoClient debug: 2015-07-07 10:40:26 - REPLSET (FINE): selecting near server: done
PhpMongoClient debug: 2015-07-07 10:40:26 - REPLSET (INFO): pick server: random element 0
PhpMongoClient debug: 2015-07-07 10:40:26 - REPLSET (INFO): - connection: type: PRIMARY, socket: 42, ping: 0, hash: primary_host.lan:27017;rs0;database_name/user/5ca571e7db198eeee3abee35857bfd53;30751
PhpMongoClient debug: 2015-07-07 10:40:26 - CON (FINE): No timeout changes for primary_host.lan:27017;rs0;database_name/user/5ca571e7db198eeee3abee35857bfd53;30751
PhpMongoClient debug: 2015-07-07 10:40:30 - CON (FINE): No timeout changes for primary_host.lan:27017;rs0;database_name/user/5ca571e7db198eeee3abee35857bfd53;30751
Update duration: 3943ms
Update duration: 3476ms
Update duration: 2008ms
Update duration: 961ms
Update duration: 956ms
Update duration: 20ms
Update duration: 20ms
Update duration: 3ms
Update duration: 42ms
Update duration: 24ms
Update duration: 25ms
Update duration: 56ms
Update duration: 24ms
Update duration: 11ms
Update duration: 11ms
Update duration: 3ms
Update duration: 2ms
Update duration: 3ms
Update duration: 1ms
Update duration: 1ms
Update duration: 1ms
Update duration: 2ms
local.oplog.rs is the capped collection that holds the oplog. You set its size at creation using the oplogSizeMB setting. To resize the oplog after replica set initiation, use the Change the Size of the Oplog procedure.
MongoDB Oplog happens to be a special collection that keeps a record of all the operations that modify the data stored in the database. The Oplog in MongoDB can be created after starting a Replica Set member. The process is carried out for the first time with a default size.
The MongoDB Oplog is created after starting a replica set member for the first time with a default size. After that, MongoDB Oplog size will grow up to 5% of free disk space. In the Bitnami Multi-Tier Solution for MongoDB, the replica sets are deployed with 50GB of data disk by default.
This issue has now disappeared. Two actions were undertaken:
Unfortunately, the two changes were put online with no much time in between them. I have the suspicion that reducing the workload did the trick (which might or might not be linked to the issue that @steve-brisk pointed out) at least for now (we'll see when we hit again the previous workload levels). But, as the version is also updated, it might be that even with the previous levels of workload we might not hit this issue again
I have no evidence pointing to one of the two solutions alone. But after taking both of them, the issue is resolved
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With