Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Store complex (i.e. label + id) meta data in SOLR document

Tags:

solr

lucene

I use SOLR to store documents having some meta data that is composed out of multiple values. Usually an id with a label. A simple example would be the name of a city and the unique id of that city. The id is needed, because different cities can have the same name like Berlin in Germany and Berlin in the US. The name is obvioulsy needed, because I want to search for that string.

If I use facets, I would like to get back two facets having the label "Berlin". If I restrict my search (using some other meta data field) to documents from germany, I would expect to get only one facet for the german Berlin. Obviously this does not work, if I store id and label in two seperated SOLR fields.

I would assume that this is not an uncommon requirement, but I was not able to find any useful information. My current approaches are:

  • Implement a complete custom field type in Java: Hard to estimate for me, because I'm currently just a SOLR user, not a SOLR developer.

  • Put id and label in a single string (like "123:Berlin" and "456:Berlin") and define custom field types in schema.xml using a custom analyzer which splits the value. Sound reasonable to me, but I'm not 100% sure if it will work with faceting.

  • I found some references to subfields, but only on older pages and I was not able to find useful documentation.

Is there some well known way to solve this in SOLR?

like image 990
Achim Avatar asked May 20 '13 20:05

Achim


2 Answers

Pivot faceting can work.

Say you have the fields: cityId, cityName, country

Do a pivot facet over city-id, city-name by using query parameters:

facet.pivot=cityId,cityName

At the first level, like a standard facet, you will get each city ID. But on the second level, you will get the name of each city. Given that each city ID will have only one name, you can simply read each city ID's name from the next facet level (under the pivot element in the XML).

<lst name="facet_pivot">
    <arr name="cityId,city">
        <lst>
            <str name="field">cityId</str>
            <str name="value">1</str>
            <int name="count">1</int>
            <arr name="pivot">
                <lst>
                    <str name="field">city</str>
                    <str name="value">berlin</str>
                    <int name="count">1</int>
                </lst>
            </arr>
        </lst>
        <lst>
            <str name="field">cityId</str>
            <str name="value">2</str>
            <int name="count">1</int>
            <arr name="pivot">
                <lst>
                    <str name="field">city</str>
                    <str name="value">berlin</str>
                    <int name="count">1</int>
                </lst>
            </arr>
        </lst>
        <lst>
            <str name="field">cityId</str>
            <str name="value">3</str>
            <int name="count">1</int>
            <arr name="pivot">
                <lst>
                    <str name="field">city</str>
                    <str name="value">melbourne</str>
                    <int name="count">1</int>
                </lst>
            </arr>
        </lst>
    </arr>
</lst>

Basically, if the ID is unique, you will be guaranteed to only have one pivot value at the second level.

Optionally, if you want to group your 'Berlins' together, just reverse the order of the facet pivot and make it:

facet.pivot=cityName,cityId

and you will get 'Berlin' at the first level and possibly multiple IDs at the second level (and as a bonus, you could add a third level country so that you can read the country for each city off the third level).

like image 130
prunge Avatar answered Nov 08 '22 07:11

prunge


There seems no out of the box solution.

  1. Your #2 should work fine with some client side modifications.
  2. You can index your data with id_name as a single string field. Needs to change at indexing time. Easier using Transformers if you are using DIH.
  3. You would have unique facets for each id now, and at Client side you can always split the Facets for display.

You can also check Facet Pivots, which can provide an Hierarchical Faceting

like image 38
Jayendra Avatar answered Nov 08 '22 08:11

Jayendra