Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Zookeeper and SolrCloud on AWS EC2 instances

I have used Solr for a while, but am new to SolrCloud. I am investigating whether it makes sense in my context to deploy SolrCloud or to have multiple Solr instances (with matching indexed content) sitting behind an ELB.

My deployment will be in AWS on EC2 instances. Our current troubleshooting strategy in AWS is to terminate misbehaving instances and allow them to be automatically recreated by an AutoScaling group (which configures new instances via scripts when they are created). In fact, we do not have access to log on to the instances once they are in production. Everything stored in Solr can be re-indexed, so there is not a concern for data loss.

When trying to understand the SolrCloud infrastructure, however, I had a few questions:

  • Is Zookeeper able to automatically add a new instance if I destroy one of them? Everything I have seen seems to have static IP addresses in the configurations, which would require the configs to be updated (and Zookeeper restarted) if an instance was terminated and replaced.
  • Is there a "master" Zookeeper instance that I should call, or can I call any of them? If I can call any of them, we would likely put an ELB in front of Zookeeper.
  • If we hit heavy usage and allow the AWS AutoScaling group to create additional servers that serve as SolrCloud shards, will SolrCloud gracefully add the instances and terminate them without problems? (This appears to be true, and the whole point of using SolrCloud.)
like image 722
Josh Edwards Avatar asked Aug 04 '15 15:08

Josh Edwards


People also ask

What is ZooKeeper SolrCloud?

SolrCloud is flexible distributed search and indexing, without a master node to allocate nodes, shards and replicas. Instead, Solr uses ZooKeeper to manage these locations, depending on configuration files and schemas. Queries and updates can be sent to any server.

What are the 3 different methods that you connect to a EC2 instance?

AWS support many ways to let you connect to your servers(EC2), we will introduce three methods : SSH, Instance Connect, System Manager and deep dive in EC2 Instance Connect and System Manager – Session Manager.

Is Solr available in AWS?

This guide is a tutorial on how to set up a multi-node SolrCloud cluster on Amazon Web Services (AWS) EC2 instances for early development and design.

Do I need ZooKeeper for Solr?

Although Solr comes bundled with Apache ZooKeeper, you should consider yourself discouraged from using this internal ZooKeeper in production. Shutting down a redundant Solr instance will also shut down its ZooKeeper server, which might not be quite so redundant.


1 Answers

  • Is Zookeeper able to automatically add a new instance if I destroy one of them? Everything I have seen seems to have static IP addresses in the configurations, which would require the configs to be updated (and Zookeeper restarted) if an instance was terminated and replaced.

AN: In ZooKeeper, you will just have to mention about other ZooKeepers. This is to make the ZooKeepers aware of other running ZooKeepers. You don't need to change this config unless you plan to increase/decrease the number of ZooKeepers. Even if we have to do, we can do without disturbing the cluster by doing one at time. Also we keep hostname in config so that change in ip will have no impact on this.

  • Is there a "master" Zookeeper instance that I should call, or can I call any of them? If I can call any of them, we would likely put an ELB in front of Zookeeper.

AN: In ZooKeeper, we have a leader and followers. We don't need to bother about them as we don't communicate with ZooKeepers

  • If we hit heavy usage and allow the AWS AutoScaling group to create additional servers that serve as SolrCloud shards, will SolrCloud gracefully add the instances and terminate them without problems? (This appears to be true, and the whole point of using SolrCloud.)

AN: When you create a new SOLR node, you will have to start the node under the same cluster (Pass same ZooKeepers). Once you start with this, you will have to split a shard and move it to another node so as to balance the cluster. Not automated as of now.

SOLR Nodes are the one that you have to add in your ELB.

When you start a SOLR node, you will mention the list of ZooKeepers by which SOLR node will understand which cluster is that part of and other nodes serving the cluster

like image 179
Aneesh Mon N Avatar answered Sep 26 '22 01:09

Aneesh Mon N