Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I Schedule data imports in Solr

Tags:

solr

The wiki page, http://wiki.apache.org/solr/DataImportHandler explains how to index data using DataImportHandler. But the example uses a command to initiate the import operation. How can I schedule a job to do this on a regular basis?c

like image 592
Eldo Avatar asked Jul 08 '10 17:07

Eldo


People also ask

What is full-import and Delta-import in Solr?

In other words, a full-import will execute exactly 1 query for each defined entity + N queries for each sub-entity, while a delta-import will execute 1 query to get given entity's changed elements list + N queries for each changed element + another N queries for each defined sub-entity.

What is dih in Solr?

The Data Import Handler (DIH) provides a mechanism for importing content from a data store and indexing it. In addition to relational databases, DIH can index content from HTTP based data sources such as RSS and ATOM feeds, e-mail repositories, and structured XML where an XPath processor is used to generate fields.


2 Answers

On UNIX/Linux, cron jobs are your friends! On Windows, there is Task Scheduler.

UPDATE
To do it from Java code, since this is a simple GET request, you can use the HTTP Client library. See this tutorial on using the GetMethod.

If you need to programmatically send other requests to Solr, you probably should use the Solrj library. It allows to send all the basic commands to Solr ant it can be configured to access any Solr handlers:

CommonsHttpSolrServer server = new CommonsHttpSolrServer("http://localhost:8983/solr");
ModifiableSolrParams params = new ModifiableSolrParams();
params.set("command", "full-import");
QueryRequest request = new QueryRequest(params);
request.setPath("/dataimport");
server.request(request);
like image 61
Pascal Dimassimo Avatar answered Sep 21 '22 15:09

Pascal Dimassimo


I was able to make it work following the steps:

  1. Create classes ApplicationListener, HTTPPostScheduler and SolrDataImportProperties (source code listed on http://wiki.apache.org/solr/DataImportHandler#Scheduling). I believe these classes haven't been committed yet.

  2. Add the following listener to Solr web.xml file:

    <listener>
       <listener-class>org.apache.solr.handler.dataimport.scheduler.ApplicationListener</listener-class>
    </listener>
    
  3. Configure dataimport.properties as per instructions in the wiki page.

like image 21
LeoO Avatar answered Sep 21 '22 15:09

LeoO