Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Solr DataImportHandler delta import queries

Tags:

solr

The delta import syntax for the Solr 1.4 data import handler allows for up to 4 queries (query, deltaImportQuery, deltaQuery & parentDeltaQuery), but I am unclear on the usage of the "query" query.

In the following example, the "query" query does the same as the deltaImportQuery without the where clause.

<entity name="data-table" pk="id"
        query="select id,Subject,Text,UserID,CreatedDate,TopicID,TopicType,EPiPageID,ForumID,Room1ID,Room1Name,LastModifiedDate from dbo.CustomForumPosts"
        deltaImportQuery="select id,Subject,Text,UserID,CreatedDate,TopicID,TopicType,EPiPageID,ForumID,Room1ID,Room1Name,LastModifiedDate from dbo.CustomForumPosts where id='${dataimporter.delta.id}'"
        deltaQuery="select id from dbo.CustomForumPosts where LastModifiedDate > '${dataimporter.last_index_time}'">            
</entity>

I don't understand why, or if, I need the "query" query - it would appear to do nothing more than describe the full import equivalent of this delta. Can anyone explain?

like image 882
Jason Avatar asked Aug 04 '10 09:08

Jason


People also ask

What is full import and Delta import in SOLR?

In other words, a full-import will execute exactly 1 query for each defined entity + N queries for each sub-entity, while a delta-import will execute 1 query to get given entity's changed elements list + N queries for each changed element + another N queries for each defined sub-entity.

What is dih in SOLR?

The Data Import Handler (DIH) provides a mechanism for importing content from a data store and indexing it. In addition to relational databases, DIH can index content from HTTP based data sources such as RSS and ATOM feeds, e-mail repositories, and structured XML where an XPath processor is used to generate fields.

How does SOLR store data?

Apache Solr stores the data it indexes in the local filesystem by default. HDFS (Hadoop Distributed File System) provides several benefits, such as a large scale and distributed storage with redundancy and failover capabilities. Apache Solr supports storing data in HDFS.


1 Answers

Query refers to the query that is used when doing a full import as you implied. The documentation says:

  • The query gives the data needed to populate fields of the Solr document in full-import
  • The deltaImportQuery gives the data needed to populate fields when running a delta-import
  • The deltaQuery gives the primary keys of the current entity which have changes since the last index time

http://wiki.apache.org/solr/DataImportHandler#Using_delta-import_command

like image 157
John P Avatar answered Sep 20 '22 15:09

John P