Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Solr documents with child elements?

Is it somehow possible to create a solr document that contains sub-elements?

For example, how would I represent something like this:

<person first="Bob" last="Smith">
   <children>
      <child first="Little" last="Smith" />
      <child first="Junior" last="Smith" />
   </children>
</person>

What is the usual way to solve this problem?

like image 949
cambo Avatar asked Apr 07 '11 17:04

cambo


3 Answers

As of Solr 4.7 and 4.8, Solr supports nested documents:

{
"id": "chapter1",
"title" : "Indexing Child Documents in JSON",
"content_type": "chapter",
"_childDocuments_": [
  {
    "id": "1-1",
    "content_type": "page",
    "text": "ho hum... this is page 1 of chapter 1"
  },
  {
    "id": "1-2",
    "content_type": "page",
    "text": "more text... this is page 2 of chapter 1"
  }
]
}

See the Solr release notes for more.

like image 71
whomer Avatar answered Nov 03 '22 23:11

whomer


You can model this in different ways, depending on your searching/faceting needs. Usually you'll use multivalued or dynamic fields. In the next examples I'll omit the field type, indexed and stored flags:

<field name="first"/>
<field name="last"/>
<field name="child_first" multiValued="true"/>
<field name="child_last" multiValued="true"/>

It's up to you to correlate the children first names and last names. Or you could just put both in a single field:

<field name="first"/>
<field name="last"/>
<field name="child_first_and_last" multiValued="true"/>

Another one:

<field name="first"/>
<field name="last"/>
<dynamicField name="child_first_*"/>
<dynamicField name="child_last_*"/>

Here you would store fields 'child_first_1', 'child_last_1', 'child_first_2', 'child_last_2', etc. Again it's up to you to correlate values, but at least you have an index. With some code you could make this transparent.

Bottom line: as the Solr wiki says: "Solr provides one table. Storing a set database tables in an index generally requires denormalizing some of the tables. Attempts to avoid denormalizing usually fail." It's up to you to denormalize your data according to your search needs.

UPDATE: Since version 4.5 or so Solr supports nested documents directly: https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-BlockJoinQueryParsers

like image 25
Mauricio Scheffer Avatar answered Nov 03 '22 23:11

Mauricio Scheffer


Having a separate fields for children leads to false positive matches. Concatenated fields works in some meaning but it's really limited approach. We have a lot of experience in the similar tasks blogged at http://blog.griddynamics.com/2011/06/solr-experience-search-parent-child.html

like image 28
mkhludnev Avatar answered Nov 03 '22 23:11

mkhludnev