We're using Apache Solr (3.1.0) to index a lot of articles written for multiple sites. We have a master/slave setup (replication config at the bottom), where server 1 indexes the articles, and server 2 replicates the index. The slave should poll the master every 60 seconds, but instead, we can see 10 to up to 75 consecutive /replication
calls nearly every time.
Each Solr core (${solr.core.name}
in the slave config)
represents a different site. The /replication
calls I see most are tied to the biggest site. One of the cores only got 1 call per minute, and I've been able to reproduce this there after calling update?commit=true
a few times, so this leads me to think it's related to the amount of commits the master performs.
So my question is, how do I stop the Solr slave from replicating the index dozens of times and force it to replicate just once per minute? I've tried playing with the commitReserveDuration
parameter in the master config, but I don't really see any difference.
master replication config:
<requestHandler name="/replication" class="solr.ReplicationHandler" >
<lst name="master">
<str name="replicateAfter">commit</str>
<str name="replicateAfter">startup</str>
</lst>
</requestHandler>
slave replication config:
<requestHandler name="/replication" class="solr.ReplicationHandler" >
<lst name="slave">
<str name="masterUrl">http://${solr.master.server}/search/${solr.core.name}/replication</str>
<str name="pollInterval">00:00:60</str>
</lst>
</requestHandler>
Solr replication uses the master-slave model to distribute complete copies of a master index to one or more slave servers. The master server receives all updates and all changes are made against a single master server.
The standard procedure for scaling Lucene/Solr is as follows: first, maximize performance on a single machine. Next, absorb high query volume by replicating to multiple machines. If the index becomes too large for a single machine, split the index across multiple machines (or, shard the index).
Near Real Time (NRT) search means that documents are available for search almost immediately after being indexed. This allows additions and updates to documents to be seen in 'near' real time. Solr does not block updates while a commit is in progress.
By adding content to an index, we make it searchable by Solr. A Solr index can accept data from many different sources, including XML files, comma-separated value (CSV) files, data extracted from tables in a database, and files in common file formats such as Microsoft Word or PDF.
in the config you specified replication after as commit , so incase if you are issuing commit from the code very frequently then it will trigger replication , so i would suggest to change to optimize instead of commit. This should solve your problem. Here is the link which gives more details on the replicationafter settings.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With