I want to count distinct values of a field from my dataset. For example:
The terms
aggregation gives me the number of occurences by username
. I want to only count unique usernames, not all.
Here's my request:
POST appzz/messages/_search
{
"aggs": {
"words": {
"terms": {
"field": "username"
}
}
},
"size": 0,
"from": 0
}
Is there a unique
option or something like that?
Cardinality aggregationedit. A single-value metrics aggregation that calculates an approximate count of distinct values. Assume you are indexing store sales and would like to count the unique number of sold products that match a query: POST /sales/_search?
doc_count_error_upper_bound is the maximum number of those missing documents. response = client.
The query can either be provided using a simple query string as a parameter, or using the Query DSL defined within the request body. The count API supports multi-target syntax. You can run a single count API search across multiple data streams and indices. The operation is broadcast across all shards.
You're looking for the cardinality aggregation which was added in Elasticsearch 1.1. It allows you to request something like this:
{
"aggs" : {
"unique_users" : {
"cardinality" : {
"field" : "username"
}
}
}
}
We had a long discussion about it with one of the ES guys in a recent Elasticsearch meetup we had here. The short answer is no, there isn't. And according to him it's not something to be expected soon.
One option to kind of do it is to get all the terms (give a really big size limit) and count how many terms are returned, but it's expensive and not really valid if you have a lot of unique terms.
@DerMiggel: I tried using cardinality for my project. Surprising on my local system out of a total dump of some 2,00,000 documents, I tried the cardinality with precision_threshold of 100, 0 and 40,000(as the max value). The first two times, result was different(count of 175 and 184 respectively) and for 40,000 got out of memory exception. Also the computation time was huge as compared to other aggs. Hence I feel cardinality is not actually that correct and might crash your system when required high accuracy and precision.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With