Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the interaction between Solr and Zookeeper?

I've been working on a system where they use SolrCloud, which entails a Zookeeper ensemble that helps "manage the overall structure so that both indexing and search requests can be routed properly" (straight out of the Solr documentation).

What exactly is this "management"? What information, what data/configuration/information do the machines running Solr read/write from the Zookeeper ensemble and why? Is the data in Zookeeper ever changed at runtime by solr? Or do you configure "the data" once and runtime is going to be reads all across SolrCloud hosts?

To put the question into perspective, this is my first contact with Zookeeper, Solr, and in many ways with distributed systems.

like image 481
Lucio Assis Avatar asked Sep 20 '17 22:09

Lucio Assis


People also ask

How does ZooKeeper work with Solr?

Instead, Solr uses ZooKeeper to manage these locations, depending on configuration files and schemas. Queries and updates can be sent to any server. Solr will use the information in the ZooKeeper database to figure out which servers need to handle the request.

Why do we need ZooKeeper in Solr?

There are certain configuration files containing cluster wide configuration. Since some of these are crucial for the cluster to function properly, you may need to upload such files to ZooKeeper before starting your Solr cluster for the first time. Examples of such configuration files (not exhaustive) are solr.

What is ZooKeeper used for?

ZooKeeper is an open source Apache project that provides a centralized service for providing configuration information, naming, synchronization and group services over large clusters in distributed systems. The goal is to make these systems easier to manage with improved, more reliable propagation of changes.


1 Answers

A Single node Solr instance uses it's own configuration files usually in a conf folder containing files like schema.xml, stopwords.txt etc. But in Solr cloud context a collection is a logical index having group of cores. These group of cores need centralised configurations (same configuration shared among cores belonging to same collection). ZooKeeper is a centralised service for maintaining configuration information in a distributed system.

You can upload, download, and edit configuration files, so that all cores belonging to the same collection get same config set.

You can read more about Solr cloud config management here

like image 74
mjlowky Avatar answered Sep 21 '22 17:09

mjlowky