I have a single database (300MB & 42,924 documents) consisting of about 20 different kinds of documents from about 200 users. The documents range in size from a few bytes to many KiloBytes (150KB or so).
When the server is unloaded, the following replication filter function takes about 2.5 minutes to complete. When the server is loaded, it takes >10 minutes.
Can anyone comment on whether these times are expected, and if not, suggest how I might optimize things in order to get better performance?
function(doc, req) {
acceptedDate = true;
if(doc.date) {
var docDate = new Date();
var dateKey = doc.date;
docDate.setFullYear(dateKey[0], dateKey[1], dateKey[2]);
var reqYear = req.query.year;
var reqMonth = req.query.month;
var reqDay = req.query.day;
var reqDate = new Date();
reqDate.setFullYear(reqYear, reqMonth, reqDay);
acceptedDate = docDate.getTime() >= reqDate.getTime();
}
return doc.user_id && doc.user_id == req.query.userid && doc._id.indexOf("_design") != 0 && acceptedDate;
}
Filtered replications works slow because for each fetched document runs complex logic to decide whether to replicate it or not:
true
or false
value to CouchDB;true
document goes to be replicated;For non-filtered replications take this list, throw away p.2-5 and let p.6 has always true
result. This overhead slows down whole replication process.
To significantly improve filtered replication speed, you may use Erlang filters via Erlang native server. They runs inside CouchDB, doesn't pass through any stdio interface and there is no JSON decode/encode overhead applied.
NOTE, that Erlang query server runs not inside sandbox like JavaScript one, so you need to really trust code that you run with it.
Another option is to optimize your filter function e.g. reduce object creation, method calls, but actually you wouldn't win much with this.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With