Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Configure DataImportHandler in SolrCloud with ZooKeeper

I have a SolrCloud configured like this: exploration of SolrCloud, the difference is that I use Solr 4.0.0 Beta. Shortly the configuration:

  • ZooKeeper on default port 2181
  • 3 instances of Solr running on different ports

This is just for testing purpose. The desired configuration is with 3 ZooKeeper instances (one for every Solr instance). I manage to index some XML files with curl command.

Questions:

  1. How can I configure DIH/collection? I managed to change the solrconfig.xml (config for dataimport-handler), add in lib the proper driver for DB connection, but in solr admin I get "sorry, no dataimport-handler defined!" The changes can be watched in zookeeper (I see the data_config.xml) and in solr admin panel I can see the updated version of solrconfig.xml.

  2. Any good tutorial for a production deploy of solrcloud (with somthink like the desired configuration mentioned before) on single or multiple machine for Ubuntu 12.04 LTS?

Any advice would be appreciated! Thanks in advance!

like image 600
vuky Avatar asked Sep 04 '12 13:09

vuky


People also ask

How does ZooKeeper work with Solr?

Instead, Solr uses ZooKeeper to manage these locations, depending on configuration files and schemas. Queries and updates can be sent to any server. Solr will use the information in the ZooKeeper database to figure out which servers need to handle the request.

What is ZooKeeper ensemble?

Ensemble is nothing but a cluster of Zookeeper servers, where in Quorum defines the rule to form a healthy Ensemble. Which is defined using a formula Q = 2N+1 where Q defines number of nodes required to form a healthy Ensemble which can allow N failure nodes.


1 Answers

Normally DIH config has nothing to do with wether you're using a single Solr instance or multiple instances in a solrCloud config. DIH will write data in the current instance's Lucene index, and then it's up to zooKeeper to speread it around on the other instances.

Make sure your DIH is propertly configured:

In solrconfig.xml, all necessary libraries are loaded. This means the two DIH jars:

<lib dir="../../../dist/" regex="solr-dataimporthandler-4.3.0.jar" />
<lib dir="../../../dist/" regex="solr-dataimporthandler-extras-4.3.0.jar" />

as well as others jars you may need (like Database JDBC driver, etc).

Still in solrconfig.xml make sure the DIH handler is declared, something like this:

<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
    <lst name="defaults">
        <str name="config">data-config.xml</str>
    </lst>
</requestHandler>

Finally, the config file you declared in the DIH handler (data-config.xml) should be in the same "conf" dir as solrconfig.xml and should have proper content, something like:

<dataConfig>

<dataSource type="JdbcDataSource" name="myDataSource" driver="oracle.jdbc.driver.OracleDriver" url="jdbc:oracle:thin:@someHost:1521:someDb" user="someUser" password="somePassword" batchSize="5000"/>  

<document name="myDoc" >
    <entity name="myDoc" dataSource="myDatasource" transformer="my.custom.Transformer" query="select col1, col2, col3 from table1 where whatever" />
</document>

</dataConfig>
like image 72
Shivan Dragon Avatar answered Sep 23 '22 12:09

Shivan Dragon