Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Counting number of documents using Elasticsearch

If one wants to count the number of documents in an index (of Elasticsearch) then there are (at least?) two possibilities:

  • Direct count

    POST my_index/_count

    should return the number of documents in my_index.

  • Using search

    Here one can use the count as the search_type or some other type. In either of the cases the total count can be extracted from the field ['hits']['total']

My questions are:

  • what is the difference between the different approaches? Which one should I prefer?

  • I raise this question because I'm experiencing different results depending on the chosen method. I'm now in the process of debugging the issue, and this question popped up.

like image 250
Dror Avatar asked Sep 09 '14 08:09

Dror


People also ask

How many documents can Elasticsearch hold?

You could have one document per product or one document per order. There is no limit to how many documents you can store in a particular index. Data in documents is defined with fields comprised of keys and values.

How check count in Kibana?

Create "topN" query on "clientip" and then histogram with count on "clientip" and set "topN" query as source. Then you will see count of different ips per time.

How do I count unique values in Elasticsearch?

There's no support for distinct counting in ElasticSearch, although non-deterministic counting exists. Use "terms" aggregation and count buckets in result. See Count distinct on elastic search question.

How are Elasticsearch scores calculated?

The scoring of a document is determined based on the field matches from the query specified and any additional configurations you apply to the search. We'll get into scoring details in just a minute, but first, be aware that just because there is a match does not mean the document is relevant to your users.


1 Answers

Probably _count is a bit faster since it doesn't have to execute a full query with ranking and result fetching and can simply return the size.

It would be interesting to know a bit more about how you manage to get different results though. For that I need more information like what exact queries you are sending and if any indexing is going on on the index.

But suppose that you do the following

  1. index some documents
  2. refresh the index

_search and _count (with a match all query) should return the same total. If not, that'd be very weird.

like image 170
Jilles van Gurp Avatar answered Sep 19 '22 05:09

Jilles van Gurp