Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Solr facet sum instead of count

Tags:

solr

lucene

I'm new to Solr and I'm interested in implementing a special facet.

Sample documents:

{ hostname: google.com, time_spent: 100 }
{ hostname: facebook.com, time_spent: 10 }
{ hostname: google.com, time_spent: 30 }
{ hostname: reddit.com, time_spent: 20 }
...

I would like to return a facet with the following structure:

{ google.com: 130, reddit.com: 20, facebook.com: 10 }

Although solr return values are much more verbose than this, the important point is how the "counts" for the facets are the sum of the time_spent values for the documents rather than the actual count of the documents matching the facet.

Idea #1:

I could use a pivot:

q:*:*
&facet=true
&facet.pivot=hostname,time_spent

However, this returns the counts of all the unique time spent values for every unique hostname. I could sum this up in my application manually, but this seems wasteful.

Idea #2

I could use the stats module:

q:*:*
&stats=true
&stats.field=time_spent
&stats.facet=hostname

However, this has two issues. First, the returned results contain all the hostnames. This is really problematic as my dataset has over 1m hostnames. Further, the returned results are unsorted - I need to render the hostnames in order of descending total time spent.

Your help with this would be really appreciated!

Thanks!

like image 292
advait Avatar asked Aug 13 '14 23:08

advait


1 Answers

With Solr >=5.1, this is possible:

Facet Sorting

The default sort for a field or terms facet is by bucket count descending. We can optionally sort ascending or descending by any facet function that appears in each bucket. For example, if we wanted to find the top buckets by average price, then we would add sort:"x desc" to the previous facet request:

$ curl http://localhost:8983/solr/query -d 'q=*:*&
 json.facet={
   categories:{
     type : terms,
     field : cat,
     sort : "x desc",   // can also use sort:{x:desc}
     facet:{
       x : "avg(price)",
       y : "sum(price)"
     }
   }
 }
'

See Yonik's Blog: http://yonik.com/solr-facet-functions/

For your use case this would be:

json.facet={
  hostname_time:{
    type: terms,
    field: hostname,
    sort: "time_total desc",
    facet:{
      time_total: "sum(time_spent)",
    }
  }
}

Calling sum() in nested facets worked for us only in 6.3.0.

like image 139
Risadinha Avatar answered Oct 19 '22 02:10

Risadinha