Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apache Solr: Slave replicates 10+ times every time it polls (excessive commits?)

We're using Apache Solr (3.1.0) to index a lot of articles written for multiple sites. We have a master/slave setup (replication config at the bottom), where server 1 indexes the articles, and server 2 replicates the index. The slave should poll the master every 60 seconds, but instead, we can see 10 to up to 75 consecutive /replication calls nearly every time.

Each Solr core (${solr.core.name} in the slave config) represents a different site. The /replication calls I see most are tied to the biggest site. One of the cores only got 1 call per minute, and I've been able to reproduce this there after calling update?commit=true a few times, so this leads me to think it's related to the amount of commits the master performs.

So my question is, how do I stop the Solr slave from replicating the index dozens of times and force it to replicate just once per minute? I've tried playing with the commitReserveDuration parameter in the master config, but I don't really see any difference.

master replication config:

 <requestHandler name="/replication" class="solr.ReplicationHandler" >
   <lst name="master">
     <str name="replicateAfter">commit</str>
     <str name="replicateAfter">startup</str>
   </lst>
 </requestHandler>

slave replication config:

 <requestHandler name="/replication" class="solr.ReplicationHandler" >
   <lst name="slave">
     <str name="masterUrl">http://${solr.master.server}/search/${solr.core.name}/replication</str>
     <str name="pollInterval">00:00:60</str>
   </lst>
 </requestHandler>
like image 837
Ivo van der Veeken Avatar asked Feb 26 '16 13:02

Ivo van der Veeken


People also ask

How Solr replication works?

Solr replication uses the master-slave model to distribute complete copies of a master index to one or more slave servers. The master server receives all updates and all changes are made against a single master server.

How to scale Solr?

The standard procedure for scaling Lucene/Solr is as follows: first, maximize performance on a single machine. Next, absorb high query volume by replicating to multiple machines. If the index becomes too large for a single machine, split the index across multiple machines (or, shard the index).

What is nrt search?

Near Real Time (NRT) search means that documents are available for search almost immediately after being indexed. This allows additions and updates to documents to be seen in 'near' real time. Solr does not block updates while a commit is in progress.

How does Solr index data?

By adding content to an index, we make it searchable by Solr. A Solr index can accept data from many different sources, including XML files, comma-separated value (CSV) files, data extracted from tables in a database, and files in common file formats such as Microsoft Word or PDF.


1 Answers

in the config you specified replication after as commit , so incase if you are issuing commit from the code very frequently then it will trigger replication , so i would suggest to change to optimize instead of commit. This should solve your problem. Here is the link which gives more details on the replicationafter settings.

like image 169
Adarsh H D Dev Avatar answered Oct 22 '22 01:10

Adarsh H D Dev