Trouble with facet counts

Tags:

elasticsearch

I'm attempting to use ElasticSearch for analytics -- specifically to track "top content" for hand-rolled Rails CMS. The requirement is quite a bit more complicated than keeping a counter for each piece of content. I won't get into the depth of problem right now, as I can't seem to get even the basics working.

My problem is this: I'm using facets and the counts aren't what I expect them to be. For example:

Query:

{"facets":{"el_ids":{"terms":{"field":"el_id","size":1,"all_terms":false,"order":"count"}}}}

Result:

{"el_ids":{"_type":"terms","missing":0,"total":16672,"other":16657,"terms":[{"term":"quis","count":15}]}}

Ok, great, the piece of content with id "quis" had 15 hits and since the order is count, it should be my top piece of content. Now lets get the top 5 pieces of content.

Query:

{"facets":{"el_ids":{"terms":{"field":"el_id","size":5,"all_terms":false,"order":"count"}}}}

Result (just the facet):

[
  {"term":"qgz9","count":26},
  {"term":"quis","count":15},
  {"term":"hnqn","count":15},
  {"term":"higp","count":15},
  {"term":"csns","count":15}
]

Huh? So the piece of content w/ id "qgz9" had more hits with 26? Why wasn't it the top result in the first query?

Ok, lets get the top 100 now.

Query:

{"facets":{"el_ids":{"terms":{"field":"el_id","size":100,"all_terms":false,"order":"count"}}}}

Results (just the facet):

[
  {"term":"qgz9","count":43},
  {"term":"difc","count":37},
  {"term":"zryp","count":31},
  {"term":"u65r","count":31},
  {"term":"sxsi","count":31},
  ...
]

So now "qgz9" has 43 hits instead of 26? How can that be? I can assure you there's nothing happening in the background modifying the index. If I repeat these queries, I get the same results.

As I repeat this process of increasing the result size, counts continue to change and new content ids emerge at the top. Can someone explain to me what I'm doing wrong or where my understanding of how this works is flawed?

612

asked Jul 07 '12 12:07

Derek Harmel

1 Answers

It turns out that this is a known issue:

...the way top N facets work now is by getting the top N from each shard, and merging the results. This can give inaccurate results.

By default, my index was being created with 5 shards. By changing this so the index only has a single shard, the counts behave inline with my expectations. Another workaround would be to always set size to a value greater than the number of expected facets and peel off the top N results.

answered Sep 28 '22 05:09

Derek Harmel

Related questions
                            
                                Elasticsearch management tools like phpMyAdmin for mysql [closed]
                            
                                Elasticsearch suggestions with filter
                            
                                Dedup elasticsearch results using multiple fields as unique key
                            
                                How to build a GraphQL API on top of a Django/Elasticsearch/MySQL backend?
                            
                                Elasticsearch - Aggregations on part of bool query
                            
                                Unable to install Search Guard plugin for Elasticsearch-5.x
                            
                                Why install logstash if I can just send the data through REST to elasticsearch?
                            
                                How to get all field names in elasticsearch index
                            
                                What is the fastest way of indexing to ElasticSearch
                            
                                What's the best Kibana multi tenancy free open source project?
                            
                                How to add pre-existing data from DynamoDB to Elasticsearch?
                            
                                ElasticSearch - get all available filters (aggregate) from index
                            
                                failed to send join request to master elastic search 5.4 cluster
                            
                                Implementing Array.Except(Array2) > 0 query in elasticsearch filter?
                            
                                Letting only one elasticsearch pod come up on a node in Kubernetes
                            
                                Query to see if a field contains a string using Query DSL
                            
                                Use template to define sub-chart values with Helm
                            
                                Amazon Neptune Full Text Search - specify fields
                            
                                CloudWatch resource access policy error while creating Amazon Elasticsearch Service via Cloud Formation
                            
                                elasticsearch vs solr regarding data structure/query features

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Trouble with facet counts

Tags:

elasticsearch

Derek Harmel

People also ask

1 Answers

Derek Harmel

Recent Activity

Donate For Us