Why, after increasing max_dbs_open, do replications still fail with "increase max_dbs_open"?

Question

Our application uses CouchDB filtered replications to move data between user databases and a master database. As we increase the number of users, replications start failing with this message

Source and target databases out of sync. Try to increase max_dbs_open at both servers.

We've done that, increasing the number of max_dbs_open to a ridiculously high number (10,000) but the failures and messages remain the same. Obviously something else is wrong. Does anyone know what it is?

lambmj · Accepted Answer

As it turns out, the message to increase max_dbs_open is at best a partial answer and at worst is misleading. In our case the problem wasn't the number of databases that were open but apparently the number of HTTP connections used by our many replications.

Each replication can use min(worker_processes + 1, http_connections) where worker_processes are the number of workers assigned to each replication and http_connections is the maximum number of HTTP connections allotted for each replication as described in this document.

So the total number of connections used is

number of replications * min(worker_processes + 1, http_connections)

The default value of worker_processes is 4 and the default value of http_connections is 20. If there are 100 replications, the total number of HTTP connections used by replication is 500. Another setting, max_connections, determines the maximum number of HTTP connections a CouchDB server will allow as described in this document. The default is 2048.

In our case each user has two replications -- one from the user to the master database and another from the master database to the user. So, in our case, with the default settings, each time we added a user we were adding an additional 10 HTTP connections eventually blowing through the default max_connections.

Since our replications are minimal and only a small amount of data is moved from the user to the master and from the master to the user, we dialed back the number of worker_processes, http_connections, increased max_connections and all is well.

UPDATE

A couple of other findings

It was necessary to raise the ulimit on the process to allow it to have more open connections
Creating replications too quickly also caused problems. If I dialed back how quickly I created new replications it also helped ease the problem. ymmv.

Why, after increasing max_dbs_open, do replications still fail with "increase max_dbs_open"?

Tags:

couchdb

lambmj

1 Answers

lambmj

Recent Activity

Donate For Us

Why, after increasing max_dbs_open, do replications still fail with "increase max_dbs_open"?

Tags:

couchdb

lambmj

1 Answers

lambmj

Related questions

Recent Activity

Donate For Us