Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find distinct values, not distinct counts in elasticsearch

Elasticsearch documentation suggests* that their piece of code

*documentation fixed

GET /cars/transactions/_search?search_type=count {   "aggs": {     "distinct_colors": {       "cardinality": {         "field": "color"       }     }   } } 

corresponds to sql query

SELECT DISTINCT(color) FROM cars 

but it actually corresponds to

SELECT COUNT(DISTINCT(color)) FROM cars 

I don't want to know how many distinct values I have but what are the distinct values. Anyone knows how to achieve that?

like image 848
jasiustasiu Avatar asked Jan 28 '15 10:01

jasiustasiu


People also ask

How do I get distinct values of a field in ElasticSearch?

You can user terms aggregation to get distinct values from your _source. As you have mentioned you don't want any other data from _source hence you can give size=0. This will give you all unique Gender values with their count in the response.

How do I get distinct data from a list?

List<int> myList = list. Distinct(). ToList();

What is cardinality aggregation ElasticSearch?

Elasticsearch Aggregations Cardinality AggregationA single-value metrics aggregation that calculates an approximate count of distinct values. Values can be extracted either from specific fields in the document or generated by a script.

When to use distinct in sql?

SQL DISTINCT clause is used to remove the duplicates columns from the result set. The distinct keyword is used with select keyword in conjunction. It is helpful when we avoid duplicate values present in the specific columns/tables. The unique values are fetched when we use the distinct keyword.


2 Answers

Use a terms aggregation on the color field. And you need to pay attention to how that field you want to get distinct values on is analyzed, meaning you need to make sure you're not tokenizing it while indexing, otherwise every entry in the aggregation will be a different term that is part of the field content.

If you still want tokenization AND to use the terms aggregation you might want to look at not_analyzed type of indexing for that field, and maybe use multi fields.

Terms aggregation for cars:

GET /cars/transactions/_search?search_type=count {   "aggs": {     "distinct_colors": {       "terms": {         "field": "color",         "size": 1000       }     }   } } 
like image 111
Andrei Stefan Avatar answered Sep 19 '22 10:09

Andrei Stefan


To update the excellent answer from Andrei Stefan, we need to say that the query parameter search_type=count is no more supported in Elasticsearch 5. The new way of doing this is to add "size" : 0 in the body such as :

GET /cars/transactions/_search {   "size": 0,   "aggs": {     "distinct_colors": {       "terms": {         "field": "color",         "size": 1000       }     }   } } 
like image 44
Ortomala Lokni Avatar answered Sep 19 '22 10:09

Ortomala Lokni