Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

apache solr : sum of data resulted from group by

Tags:

solr

lucene

We have a requirement where we need to group our records by a particular field and take the sum of a corresponding numeric field

e.x. select userid, sum(click_count) from user_action group by userid;

We are trying to do this using apache solr and found that there were 2 ways of doing this:

  1. Using the field collapsing feature (http://blog.jteam.nl/2009/10/20/result-grouping-field-collapsing-with-solr/) but found 2 problems with this: 1.1. This is not part of release and is available as patch so we are not sure if we can use this in production. 1.2. We do not get the sum back but individual counts and we need to sum it at the client side.

  2. Using the Stats Component along with faceted search (http://wiki.apache.org/solr/StatsComponent). This meets our requirement but it is not fast enough for very large data sets.

I just wanted to know if anybody knows of any other way to achieve this. Appreciate any help.

Thanks,

Terance.

like image 210
Terance Dias Avatar asked Jun 03 '10 12:06

Terance Dias


1 Answers

Why instead don't you use the StatsComponent? - Available from Solr 1.4 up.

$ curl 'http://search/select?q=*&rows=0&stats=on&stats.field=click_count' |
  tidy -xml -indent -quiet -wrap 2000000

<?xml version="1.0" encoding="utf-8"?>
<response>
  <lst name="responseHeader">
    <int name="status">0</int>
    <int name="QTime">17</int>
    <lst name="params">
      <str name="q">*</str>
      <str name="stats">on</str>
      <arr name="stats.field">
        <str>click_count</str>
      </arr>
      <str name="rows">0</str>
    </lst>
  </lst>
  <result name="response" numFound="577" start="0" />
  <lst name="stats">
    <lst name="stats_fields">
      <lst name="click_count">
        <double name="min">1.0</double>
        <double name="max">3487.0</double>
        <double name="sum">47912.0</double>
        <long name="count">577</long>
        <long name="missing">0</long>
        <double name="sumOfSquares">4.0208702E7</double>
        <double name="mean">83.0363951473137</double>
        <double name="stddev">250.79824725438448</double>
      </lst>
    </lst>
  </lst>
</response>
like image 166
Marco Lazzeri Avatar answered Sep 28 '22 13:09

Marco Lazzeri