I use SOLR to store documents having some meta data that is composed out of multiple values. Usually an id with a label. A simple example would be the name of a city and the unique id of that city. The id is needed, because different cities can have the same name like Berlin in Germany and Berlin in the US. The name is obvioulsy needed, because I want to search for that string.
If I use facets, I would like to get back two facets having the label "Berlin". If I restrict my search (using some other meta data field) to documents from germany, I would expect to get only one facet for the german Berlin. Obviously this does not work, if I store id and label in two seperated SOLR fields.
I would assume that this is not an uncommon requirement, but I was not able to find any useful information. My current approaches are:
Implement a complete custom field type in Java: Hard to estimate for me, because I'm currently just a SOLR user, not a SOLR developer.
Put id and label in a single string (like "123:Berlin" and "456:Berlin") and define custom field types in schema.xml using a custom analyzer which splits the value. Sound reasonable to me, but I'm not 100% sure if it will work with faceting.
I found some references to subfields, but only on older pages and I was not able to find useful documentation.
Is there some well known way to solve this in SOLR?
Pivot faceting can work.
Say you have the fields: cityId, cityName, country
Do a pivot facet over city-id, city-name by using query parameters:
facet.pivot=cityId,cityName
At the first level, like a standard facet, you will get each city ID. But on the second level, you will get the name of each city. Given that each city ID will have only one name, you can simply read each city ID's name from the next facet level (under the pivot
element in the XML).
<lst name="facet_pivot">
<arr name="cityId,city">
<lst>
<str name="field">cityId</str>
<str name="value">1</str>
<int name="count">1</int>
<arr name="pivot">
<lst>
<str name="field">city</str>
<str name="value">berlin</str>
<int name="count">1</int>
</lst>
</arr>
</lst>
<lst>
<str name="field">cityId</str>
<str name="value">2</str>
<int name="count">1</int>
<arr name="pivot">
<lst>
<str name="field">city</str>
<str name="value">berlin</str>
<int name="count">1</int>
</lst>
</arr>
</lst>
<lst>
<str name="field">cityId</str>
<str name="value">3</str>
<int name="count">1</int>
<arr name="pivot">
<lst>
<str name="field">city</str>
<str name="value">melbourne</str>
<int name="count">1</int>
</lst>
</arr>
</lst>
</arr>
</lst>
Basically, if the ID is unique, you will be guaranteed to only have one pivot
value at the second level.
Optionally, if you want to group your 'Berlins' together, just reverse the order of the facet pivot and make it:
facet.pivot=cityName,cityId
and you will get 'Berlin' at the first level and possibly multiple IDs at the second level (and as a bonus, you could add a third level country
so that you can read the country for each city off the third level).
There seems no out of the box solution.
You can also check Facet Pivots, which can provide an Hierarchical Faceting
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With