Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to perform nested aggregation on multiple fields in Solr?

I am trying to perform search result aggregation (count and sum) grouping by several fields in a nested fashion.

For example, with the schema shown at the end of this post, I'd like to be able to get the sum of "size" grouped by "category" and sub-grouped further by "subcategory" and get something like this:

<category name="X">
  <subcategory name="X_A">
    <size sum="..." />
  </subcategory>
  <subcategory name="X_B">
    <size sum="..." />
  </subcategory>
</category>
....

I've been looking primarily at Solr's Stats component which, as far as I can see, doesn't allow nested aggregation.

I'd appreciate it if anyone knows of some way to implement this, with or without the Stats component.

Here is a cut-down version of the target schema:

<types>
  <fieldType name="string" class="solr.StrField" />
  <fieldType name="text" class="solr.TextField">
    <analyzer><tokenizer class="solr.StandardTokenizerFactory" /></analyzer>
  </fieldType>
  <fieldType name="date" class="solr.DateField" />
  <fieldType name="int" class="solr.TrieIntField" precisionStep="0" omitNorms="true" positionIncrementGap="0" />
</types>

<fields>
  <field name="id" type="string" indexed="true" stored="true" />
  <field name="category" type="text" indexed="true" stored="true" />
  <field name="subcategory" type="text" indexed="true" stored="true" />
  <field name="pdate" type="date" indexed="true" stored="true" />
  <field name="size" type="int" indexed="true" stored="true" />
</fields>
like image 986
Aeham Avatar asked Mar 23 '23 05:03

Aeham


1 Answers

The new faceting module in Solr 5.1 can do this, it was added in https://issues.apache.org/jira/browse/SOLR-7214

Here is how you would add sum(size) to every facet bucket, and sort descending by that statistic.

json.facet={
  categories:{terms:{
    field:category,
    sort:"total_size desc",  // this will sort the facet buckets by your stat 
    facet:{
      total_size:"sum(size)"  // this calculates the stat per bucket
    }
  }}
}

And this is how you would add in the subfacet on subcategory:

json.facet={
  categories:{terms:{
    field:category,
    sort:"total_size desc",
    facet:{
      total_size:"sum(size)",
      subcat:{terms:{ // this will facet on the subcategory field for each bucket
        field:subcategory,
        facet:{
         sz:"sum(size)"  // this calculates the sum per sub-cat bucket          
      }}
    }
  }}
}

So the above will give you the sum(size) at both the category and subcategory levels. Documentation for the new facet module is currently at http://yonik.com/json-facet-api/

like image 133
Yonik Avatar answered Mar 29 '23 23:03

Yonik