Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Solr appears to block update requests while committing

Tags:

timeout

solr

We're running a master-slave setup with Solr 3.6 using the following auto-commit options:

maxDocs: 500000

maxTime: 600000

We have approx 5 million documents in our index which takes up approx 550GB. We're running both master and slave on Amazon EC2 XLarge instances (4 virtual cores and 15GB). We don't have a particularly high write throughput - about 100 new documents per minute.

We're using Jetty as a container which has 6GB allocated to it.

The problem is that once a commit has started, all our update requests start timing out (we're not performing queries against this box). The commit itself appears to take approx 20-25mins during which time we're unable to add any new documents to Solr.

One of the answers in the following question suggests using 2 cores and swapping them once its fully updated. However this seems a little over the top.

Solr requests time out during index update. Perhaps replication a possible solution?

Is there anything else I should be looking at regarding why Solr seems to be blocking requests? I'm optimistically hoping there's a "dontBlockUpdateRequestsWhenCommitting" flag in the config that I've overlooked...

Many thanks,

like image 579
Kevin Avatar asked Nov 25 '22 23:11

Kevin


1 Answers

According to bounty reason and the problem mentioned at question here is a solution from Solr:

Solr has a capability that is called as SolrCloud beginning with 4.x version of Solr. Instead of previous master/slave architecture there are leaders and replicas. Leaders are responsible for indexing documents and replicas answers queries. System is managed by Zookeeper. If a leader goes down one of its replicas are selected as new leader.

All in all if you want to divide you indexing process that is OK with SolrCloud by automatically because there exists one leader for each shard and they are responsible for indexing for their shard's documents. When you send a query into the system there will be some Solr nodes (of course if there are Solr nodes more than shard count) that is not responsible for indexing however ready to answer the query. When you add more replica, you will get faster query result (but it will cause more inbound network traffic when indexing etc.)

like image 158
kamaci Avatar answered Dec 15 '22 07:12

kamaci