Loadbalancer and Solrcloud

Tags:

solrcloud

I am wondering how loadbalancer can be set up on top of SolrCloud or a load-balancer is not needed?

If the former, shard leaders need to be added to the loadbalancer? Then what if the shard leader changes for some reason? Or all machines in the cluster (including replica) better be added to the load balancer?

If the latter, I guess a cname needs to point to the SolrCloud cluster and it should be round robin DNS?

Any advice from some actual Solrcloud operation experience would be really appreicated.

720

asked Mar 20 '14 04:03

kee

Video Answer

4 Answers

Usually SolrCloud is used with combination of ZooKeeper, the client uses CloudSolrServer to access to SolrCloud.

The query will be done in following flow.

Note that I only read the source code of Solr partially and there are lot of guesses. Also what I read was source code of Solr 4.1, so it might be outdated.

ZooKeeper holds the list of IPAddress:Port of all SolrCloud servers.
(Client Side) The instance of CloudSolrServer retrieves the list of servers from ZooKeeper.
(Client Side) The instance of CloudSolrServer chooses one of SolrCloud server randomly and sends query to it. (Also LBHttpSolrServer chooses the server in round-robin?)
(Server Side) The SolrCloud server which recieved the query chooses randomly from replica of shards (one server per shard) from server list and redirects the query to it. (Note that all the SolrCloud server holds the server list which can be recieved from ZooKeeper)

The update will be done in same manner as above but also be populated to all servers.

Note that as for SolrCloud, the leader and replica has small difference and we can send query/update to any of the server. It is automatically redirected to other servers.

In short, the loadbalancing is done in both client side and server side. So you don't need to worry about it.

170

answered Sep 28 '22 12:09

ymonad

A Load Balancer is needed and would be implemented by Zookeeper used in conjunction with SolrCloud.

When you use SolrCloud you must setup sharding and replication through the use of Zookeeper either using the embedded Zookeeper server that comes bundled with SolrCloud or you use a stand-alone Zookeeper ensemble (which is recommended for redundancy).

Then you would use SolrCloudClient to send your queries to Zookeeper which will then forward your query to the correct shard among your cluster. SolrCloudClient will require the name and address of all your Zookeeper instances upon instantiation and your Load-Balancing will be handled as appropriate from there.

Please see the following excllent tutorial: http://www.francelabs.com/blog/tutorial-solrcloud-amazon-ec2/

Solr Docs: https://cwiki.apache.org/confluence/display/solr/Setting+Up+an+External+ZooKeeper+Ensemble

answered Sep 28 '22 13:09

jpalmer4444

This quote refers to latest version of Solr, at time of writing was ver. 7.1

Solrcloud - Distributed Requests

When a Solr node receives a search request, the request is routed behind the scenes to a replica of a shard that is part of the collection being searched.

The chosen replica acts as an aggregator: it creates internal requests to randomly chosen replicas of every shard in the collection, coordinates the responses, issues any subsequent internal requests as needed (for example, to refine facets values, or request additional stored fields), and constructs the final response for the client.

Solrcloud - Read Side Fault Tolerance

In a SolrCloud cluster each individual node load balances read requests across all the replicas in collection. You still need a load balancer on the 'outside' that talks to the cluster, or you need a smart client which understands how to read and interact with Solr’s metadata in ZooKeeper and only requests the ZooKeeper ensemble’s address to start discovering to which nodes it should send requests. (Solr provides a smart Java SolrJ client called CloudSolrClient.)

answered Sep 28 '22 13:09

freedev

I am in a similar situation where I can't rely on CloudSolrServer for loadbalancing, a possible solution that I am evaluating is to use Airbnb's synapse (http://nerds.airbnb.com/smartstack-service-discovery-cloud/) to reconfigure dynamically an existing haproxy loadbalancer based on the status of the SolrCloud cluster that we get from Zookeeper.

answered Sep 28 '22 12:09

Luca

Related questions
                            
                                Solr "real time" indexing
                            
                                SLF4J logging to file vs. DB vs. Solr
                            
                                Apache Solr java tutorials [closed]
                            
                                Latest compatible versions of Nutch and Solr
                            
                                Fuzzy Search in Solr
                            
                                sunspot solr search by multiple terms
                            
                                Amazon like search with Solr
                            
                                How to use upconfig & linkconfig scripts on external zookeeper
                            
                                How do I get sum of a field in solr 4.8
                            
                                How to rotate, override, or turn off logging from Sunspot Solr Rubygem?
                            
                                mergeFactor usage in solr 4.0
                            
                                Django haystack SearchQuerySet to QuerySet
                            
                                How to get distance in Solr 4 geospatial search?
                            
                                How to confirm Solr is running from the command line?
                            
                                disable lowercase of facet fields in solr
                            
                                Lucene Fuzzy Match on Phrase instead of Single Word
                            
                                Special characters in Solr filter fq
                            
                                Solr: org.apache.solr.common.SolrException: Invalid Date String:
                            
                                What is difference between Solr 1.4 and Solr 3.4
                            
                                Solr returns only one collation for Suggester Component

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With