How do I search for all unique values of a given field with Elasticsearch?
I have such a kind of query like select full_name from authors
, so I can display the list to the users on a form.
Set you aggregation back to count and have a Split Rows as follows. Not doing this will give you count 1 for each field value (since it is looking for unique counts) when you populate the table. Noteworthy part is setting the Top field to 0. Because Kibana won't let you enter anything else than a digit (Obviously!).
You can use Visual Builder to show the amount of duplicates by bucket. So the metric will show the amount of duplicates in the latest time interval. If you want to show a total number of duplicates, the accurate way would be to increase the bucket so much that it basically contains all the data.
Elasticsearch Aggregations provide you with the ability to group and perform calculations and statistics (such as sums and averages) on your data by using a simple search query. An aggregation can be viewed as a working unit that builds analytical information across a set of documents.
For Elasticsearch 1.0 and later, you can leverage terms aggregation
to do this,
query DSL:
{ "aggs": { "NAME": { "terms": { "field": "", "size": 10 } } } }
A real example:
{ "aggs": { "full_name": { "terms": { "field": "authors", "size": 0 } } } }
Then you can get all unique values of authors
field. size=0 means not limit the number of terms(this requires es to be 1.1.0 or later).
Response:
{ ... "aggregations" : { "full_name" : { "buckets" : [ { "key" : "Ken", "doc_count" : 10 }, { "key" : "Jim Gray", "doc_count" : 10 }, ] } } }
see Elasticsearch terms aggregations.
You could make a terms facet on your 'full_name' field. But in order to do that properly you need to make sure you're not tokenizing it while indexing, otherwise every entry in the facet will be a different term that is part of the field content. You most likely need to configure it as 'not_analyzed' in your mapping. If you are also searching on it and you still want to tokenize it you can just index it in two different ways using multi field.
You also need to take into account that depending on the number of unique terms that are part of the full_name field, this operation can be expensive and require quite some memory.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With