In Elasticsearch, We have used terms facet and terms aggregations to cope with the above mentioned problem. Unfortunately, this will surely work for small set of data. But we are dealing with data which would be around 10 million documents.
Hence, when we query to fetch all the unique values for field(Eg. company field) by using aggregation(setting "size":0) or facet(using "exclude"), we would not be able to get entire result in one stretch. It seems that elasticsearch would take lot of time to respond and ultimately it results in node failure.
The sole purpose of this process was to get count of how many unique values are present in a field(Eg. company, count of unique companies).
Any suggestions would be appreciable.
If you use Elasticsearch 1.1.0 or above, you can try to estimate the distinct counts with cardinal aggregations.
A simple query would look like this in your case:
POST /{yourIndex}/{yourType}/_search
{
"aggs" : {
"company_count" : {
"cardinality" : {
"field" : "company.company_raw",
"precision_threshold": 10000
}
}
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With