Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why, after increasing max_dbs_open, do replications still fail with "increase max_dbs_open"?

Tags:

couchdb

Our application uses CouchDB filtered replications to move data between user databases and a master database. As we increase the number of users, replications start failing with this message

Source and target databases out of sync. Try to increase max_dbs_open at both servers.

We've done that, increasing the number of max_dbs_open to a ridiculously high number (10,000) but the failures and messages remain the same. Obviously something else is wrong. Does anyone know what it is?

like image 264
lambmj Avatar asked Nov 15 '12 17:11

lambmj


1 Answers

As it turns out, the message to increase max_dbs_open is at best a partial answer and at worst is misleading. In our case the problem wasn't the number of databases that were open but apparently the number of HTTP connections used by our many replications.

Each replication can use min(worker_processes + 1, http_connections) where worker_processes are the number of workers assigned to each replication and http_connections is the maximum number of HTTP connections allotted for each replication as described in this document.

So the total number of connections used is

number of replications * min(worker_processes + 1, http_connections)

The default value of worker_processes is 4 and the default value of http_connections is 20. If there are 100 replications, the total number of HTTP connections used by replication is 500. Another setting, max_connections, determines the maximum number of HTTP connections a CouchDB server will allow as described in this document. The default is 2048.

In our case each user has two replications -- one from the user to the master database and another from the master database to the user. So, in our case, with the default settings, each time we added a user we were adding an additional 10 HTTP connections eventually blowing through the default max_connections.

Since our replications are minimal and only a small amount of data is moved from the user to the master and from the master to the user, we dialed back the number of worker_processes, http_connections, increased max_connections and all is well.

UPDATE

A couple of other findings

  1. It was necessary to raise the ulimit on the process to allow it to have more open connections

  2. Creating replications too quickly also caused problems. If I dialed back how quickly I created new replications it also helped ease the problem. ymmv.

like image 81
lambmj Avatar answered Oct 28 '22 21:10

lambmj