Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Storing arrays in Solr

Tags:

solr

How can one store an array of values in a Solr index? I am specifically trying to formulate a schema.xml file.

Consider the following potential Solr document:

ID: 351
Name: Beatles
Members:
    1) Name: John
       Instrument: Guitar
    2) Name: Paul
       Instrument: Guitar
    3) Name: George
       Instrument: Bass
    4) Name: Ringo
       Instrument: Drums

In MySQL I would have three tables, like so:

Bands:
    BandID
    Name
People:
    PersonID
    Name
    Instrument
BandsPeople:
    BandID references Bands(BandID)
    PersonID references People(PersonID)

Disregarding the concept that a person could belong to multiple bands and other advantages of the MySQL approach, my goal is to learn how to store arrays in Solr. The band is simply an example and possibly not a good one at that!

The obvious approach for having multiple Members would be a multiValued field:

<field name="member" stored="true" type="string" multiValued="true" indexed="true"/>

However, that multiValued field itself needs to have subvalues. I do not see any documentation on how to formulate the schema. Note that I am using Solr 4. Thanks.

like image 335
dotancohen Avatar asked Sep 11 '12 07:09

dotancohen


2 Answers

There are a couple answers, but sadly none of them use multivalued.

  • Leverage lucene's nested documents (could be mal-performant)
  • Heavily denormalize the documents with a document per each option. Then use Lucene's Grouping feature. (This is the 'SOLR/Lucene Way'.)
  • Follow the span query and term vector with offsets advice @ the following blog: http://blog.griddynamics.com/2011/06/solr-experience-search-parent-child.html http://blog.griddynamics.com/2011/07/solr-experience-search-parent-child.html http://blog.griddynamics.com/2011/10/solr-experience-search-parent-child.html
  • Create indexed facet names.

For indexed facet names, your data (one document) would look like this:

id="351" band="Beatles" 
   member_0="John" instrument_0="Guitar" 
   member_1="Paul" instrument_1="Guitar" 
   ...

With relatively short lists (shorter than hundreds), the latter is the easiest on your document size and complexity, but forces the issue into the client's lap as far as searching.

like image 54
inanutshellus Avatar answered Dec 10 '22 08:12

inanutshellus


Lucene already has joining so you could aproximate the db schema with them with some caveats, see Grouping and Joining in Lucene/Solr. Solr will eventually get accesss to that as well, check ongoing work

like image 31
Persimmonium Avatar answered Dec 10 '22 08:12

Persimmonium