Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elasticsearch document count returned by _stats versus _count

I'm trying to get statistics/counts on indices in my elasticsearch cluster (1.2.1). I was using the Indices Stats API (_stats endpoint) to get the total number of primary documents and their size on disk. However, I started experimenting with the Count API (_count endpoint) and noticed that the values do not align.

What is the difference between these values? It's not entirely clear from the documentation though a clue in the documentation indicates that the value returned from Indicies Stats can change when refreshing the index. This makes me wonder if this is a lower-level value from the Lucene layer.

Indices Stats API

localhost:9200/my_index/_stats

...snip...

"_all" : {
  "primaries" : {
    "docs" : {
      "count" : 8284,
      "deleted" : 87
    },
  }
}

...snip...

Count API

localhost:9200/my_index/_count

{
  "count" : 6854,
  "_shards" : {
    "total" : 40,
    "successful" : 40,
    "failed" : 0
  }
}
like image 746
AlexMaskovyak Avatar asked Mar 25 '15 22:03

AlexMaskovyak


People also ask

How many documents can Elasticsearch handle?

You could have one document per product or one document per order. There is no limit to how many documents you can store in a particular index.

How do I get total hits in Elasticsearch?

The track_total_hits parameter allows you to control how the total number of hits should be tracked. Given that it is often enough to have a lower bound of the number of hits, such as "there are at least 10000 hits", the default is set to 10,000 .

What is index rate in Elasticsearch?

2. How is an index rate used in Elasticsearch? Elasticsearch is a powerful open source search and analytics engine that makes data easy to explore. An index rate is used in Elasticsearch to keep track of how often a document is updated.


1 Answers

Actually, the docs.count you get back from the Indices stats API also includes the count of nested documents present in the index so it will always be greater or equals than the count you get back from the Count API, which only returns the count of top-level documents, i.e. documents that would be returned from a search query.

So, judging by the numbers you posted, it looks like your index contains documents with fields whose type is nested in the mapping. Sounds correct?

like image 198
Val Avatar answered Oct 09 '22 11:10

Val