Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Unique count of terms aggregations

I want to count distinct values of a field from my dataset. For example:

The terms aggregation gives me the number of occurences by username. I want to only count unique usernames, not all.

Here's my request:

POST appzz/messages/_search
{
   "aggs": {
      "words": {
         "terms": {
            "field": "username"
         }
      }
   },
   "size": 0,
   "from": 0
}

Is there a unique option or something like that?

like image 633
Sandro Munda Avatar asked Feb 14 '14 15:02

Sandro Munda


People also ask

What is cardinality aggregation?

Cardinality aggregationedit. A single-value metrics aggregation that calculates an approximate count of distinct values. Assume you are indexing store sales and would like to count the unique number of sold products that match a query: POST /sales/_search?

What is Doc_count_error_upper_bound?

doc_count_error_upper_bound is the maximum number of those missing documents. response = client.

How do you count in Elasticsearch query?

The query can either be provided using a simple query string as a parameter, or using the Query DSL defined within the request body. The count API supports multi-target syntax. You can run a single count API search across multiple data streams and indices. The operation is broadcast across all shards.


3 Answers

You're looking for the cardinality aggregation which was added in Elasticsearch 1.1. It allows you to request something like this:

{
  "aggs" : {
      "unique_users" : {
          "cardinality" : {
              "field" : "username"
          }
      }
  }
}
like image 85
DerMiggel Avatar answered Nov 04 '22 13:11

DerMiggel


We had a long discussion about it with one of the ES guys in a recent Elasticsearch meetup we had here. The short answer is no, there isn't. And according to him it's not something to be expected soon.

One option to kind of do it is to get all the terms (give a really big size limit) and count how many terms are returned, but it's expensive and not really valid if you have a lot of unique terms.

like image 32
Rotem Hermon Avatar answered Nov 04 '22 15:11

Rotem Hermon


@DerMiggel: I tried using cardinality for my project. Surprising on my local system out of a total dump of some 2,00,000 documents, I tried the cardinality with precision_threshold of 100, 0 and 40,000(as the max value). The first two times, result was different(count of 175 and 184 respectively) and for 40,000 got out of memory exception. Also the computation time was huge as compared to other aggs. Hence I feel cardinality is not actually that correct and might crash your system when required high accuracy and precision.

like image 32
piyushGoyal Avatar answered Nov 04 '22 14:11

piyushGoyal