Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elasticsearch, how to get all unique values of a field and count of total unique values?

In Elasticsearch, We have used terms facet and terms aggregations to cope with the above mentioned problem. Unfortunately, this will surely work for small set of data. But we are dealing with data which would be around 10 million documents.

Hence, when we query to fetch all the unique values for field(Eg. company field) by using aggregation(setting "size":0) or facet(using "exclude"), we would not be able to get entire result in one stretch. It seems that elasticsearch would take lot of time to respond and ultimately it results in node failure.

The sole purpose of this process was to get count of how many unique values are present in a field(Eg. company, count of unique companies).

Any suggestions would be appreciable.

like image 385
Shastry Avatar asked Oct 20 '22 03:10

Shastry


1 Answers

If you use Elasticsearch 1.1.0 or above, you can try to estimate the distinct counts with cardinal aggregations.

A simple query would look like this in your case:

POST /{yourIndex}/{yourType}/_search
{
    "aggs" : {
        "company_count" : {
            "cardinality" : {
                "field" : "company.company_raw",
                "precision_threshold": 10000
            }
        }
    }
} 
like image 143
Martin Seeler Avatar answered Oct 23 '22 11:10

Martin Seeler