How to store tree data in a Lucene/Solr/Elasticsearch index or a NoSQL db?

Tags:

Say instead of documents I have small trees that I need to store in a Lucene index. How do I go about doing that?

An example node in the tree:

class Node
{
    String data;
    String type;
    List<Node> children;
}

In the above node the "data" member variable is a space separated string of words, so that needs to be full-text searchable. The "type" member variable is just a single word.

The search query will be a tree itself and will search both the data and type in each node and also the structure of the tree for a match. Before matching against a child node, the query must first match the parent node data and type. Approximate matching on the data value is acceptable.

What's the best way to index this kind of data? If Lucene does not directly support indexing these data then can this be done by Solr or Elasticsearch?

I took a quick look at neo4j, but it seems to store an entire graph in the db, not a large collection (say billions or trillions) of small tree structures. Or my understanding was wrong?

Also, is a non-Lucene based NoSQL solution is better suited for this?

300

asked Apr 02 '12 02:04

Golam Kawsar

2 Answers

Another approach is to store a representation of the current node's location in the tree. For example, the 17th leaf of the 3rd 2nd-level node of the 1st 1st-level node of the 14th tree would be represented as 014.001.003.017.

Assuming 'treepath' is the field name of the tree location, you would query on 'treepath:014*' to find all nodes and leaves in the 14th tree. Similarly, to find all of the children of the 14th tree you would query on 'treepath:014.*'.

The major problem with this approach is that moving branches around requires re-ordering every branch after the branch that was moved. If your trees are relatively static, that may only be a minor problem in practice.

(I've seen this approach called either a 'path enumeration' or a 'Dewey Decimal' representation.)

129

answered Oct 19 '22 06:10

Mark Leighton Fisher

This requirement and the solution is captured here: Proposal for nested docs

This design was subsequently implemented both by core Lucene and Elastic Search. The BlockJoinQuery is the core Lucene implementation and Elastic Search look to have an implementation as outlined here: Elastic search nested docs

answered Oct 19 '22 05:10

MarkH

Related questions
                            
                                log file location in solr
                            
                                What are some Servlet Container pros and cons for a Solr installation?
                            
                                Can you delete a field from a document in Solr index?
                            
                                Solr wrong sort text fields
                            
                                Read JSON Data Using PHP
                            
                                Error: cannot call methods on autocomplete prior to initialization; attempted to call method 'destroy'
                            
                                Solr and facet search
                            
                                org.apache.solr.common.SolrException: Error loading class 'org.apache.solr.handler.dataimport.DataImportHandler
                            
                                String length function query in Solr
                            
                                Search Box in Symfony2 with Solr
                            
                                not attempt to authenticate using SASL (unknown error)
                            
                                Haystack says “Model could not be found for SearchResult”
                            
                                How to configure Solr for improved indexing speed
                            
                                how to migrate mysql data to ElasticSearch realtime
                            
                                Search with various combinations of space, hyphen, casing and punctuations
                            
                                Apache Solr Failover Support in Master-Slave Setup
                            
                                Speeding up Solr Indexing
                            
                                Is it possible to use environment variables within solrconfig.xml for dataDir variable?
                            
                                Tomcat SOLR multiple cores setup
                            
                                Solr search for hashtag or mentions

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to store tree data in a Lucene/Solr/Elasticsearch index or a NoSQL db?

Tags:

nosql

solr

lucene

elasticsearch

neo4j

Golam Kawsar

People also ask

2 Answers

Mark Leighton Fisher

MarkH

Recent Activity

Donate For Us