Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MongoDB slows down every 2 hours and 10 minutes accurately

In the past 3 months, my MongoDB server getting very slow every 2 hours and 10 minutes, very accurate.

My Server configuration:

  • 3 replica set, and for the purpose of data backup, 1 of them has 3600 seconds delay.
  • No slave servers to the 3 masters in the replica set.
  • Use mongoose + node.js to provide rest api.
  • About 9 reads and 1.5 writes per second in average in the 24 hours statistics data.

What I did after searching stackoverflow and google:

  • Restart the server CANNOT change the slow interval 2 hours and 10 minutes
  • Create index to all the fields I query, no impact
  • Delete data file in one server and use another one to recovery, then delete anohter and recovery back, no impact
  • Shift primary server, no impact
  • Run 'currentOps' when the database is slow, I can see a lot of query hung there, too many logs to paste here, but didn't see some abnormal query.
  • In mongo console, check "serverStatus" when the database is slow, the command waiting until the database is recovered.
  • No memory usage increase from "top" command when database is slow.
  • rest api which does not access database works well.

I guess there might be something locking, the most potential cause is that it may be building index. There are something special in my database:

  • I have about 14000 collections in one database, and is increasing. There may be 1 to 3000 records in one collections.
  • Both the number of collections and the number records are increasing dynamically.
  • Index fields will be specified when creating new collection.

I have been obsessed by this issue for 3 months. Any comments/suggestions will be highly appreciated!

Here are some logs from my log file:

Fri Jul 5 15:20:11.040 [conn2765] serverStatus was very slow: { after basic: 0, after asserts: 0, after backgroundFlushing: 0, after connections: 0, after cursors: 0, after dur: 0, after extra_info: 0, after globalLock: 0, after indexCounters: 0, after locks: 0, after network: 0, after opcounters: 0, after opcountersRepl: 0, after recordStats: 222694, after repl: 222694, at end: 222694 }

Fri Jul 5 17:30:09.367 [conn4711] serverStatus was very slow: { after basic: 0, after asserts: 0, after backgroundFlushing: 0, after connections: 0, after cursors: 0, after dur: 0, after extra_info: 0, after globalLock: 0, after indexCounters: 0, after locks: 0, after network: 0, after opcounters: 0, after opcountersRepl: 0, after recordStats: 199498, after repl: 199498, at end: 199528 }

Fri Jul 5 19:40:12.697 [conn6488] serverStatus was very slow: { after basic: 0, after asserts: 0, after backgroundFlushing: 0, after connections: 0, after cursors: 0, after dur: 0, after extra_info: 0, after globalLock: 0, after indexCounters: 0, after locks: 0, after network: 0, after opcounters: 0, after opcountersRepl: 0, after recordStats: 204061, after repl: 204061, at end: 204081 }

Here are the screen shot of my pingdom report, the server down 4 minutes every 2 hours and 7 minutes. In the beginning, the server down 2 minutes every 2 hours and 6 minutes. report from pingdom

[EDIT 1] More monitor result from host provider: CPU http://i.minus.com/iZBNyMPzLSLRr.png DiskIO http://i.minus.com/ivgrHr0Ghoz92.png Connections http://i.minus.com/itbfYq0SSMlNs.png The periodically increased connections is because connections are waiting, and the count for current connection will accumulate until database is unblocked. This is not because of huge traffic.

like image 974
Mason Zhang Avatar asked Jul 15 '13 15:07

Mason Zhang


1 Answers

We found a specific 2:10 issue. In our case, it was an execution of dbStats by MMS. We had to upgrade the cluter and the issue got resolved.

like image 157
JAR.JAR.beans Avatar answered Sep 20 '22 12:09

JAR.JAR.beans