Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elasticsearch: document size and query performance

I have an ES index with medium size documents (15-30 Mb more or less).

Each document has a boolean field and most of the times users just want to know if a specific document ID has that field set to true.

Will document size affect the performance of this query?

   "size": 1,
   "query": {
      "term": {
         "my_field": True
      }
   },
   "_source": [
      "my_field"
   ]

And will a "size":0 query results in better time performance?

like image 315
betto86 Avatar asked Jun 13 '16 13:06

betto86


3 Answers

Adding "size":0 to your query, you will avoid some net transfer this behaviour will improve your performance time.

But as I understand your case of use, you can use count

An example query:

curl -XPOST 'http://localhost:9200/test/_count -d '{
      "query": {
        "bool": {
          "must": [
            {
              "term": {
                "id": xxxxx
              }
            },
            {
              "term": {
                "bool_field": True
              }
            }
          ]
        }
      }
    }'

With this query only checking if there is some total, you will know if a doc with some id have set the bool field to true/false depending on the value that you specify in bool_field at query. This will be quite fast.

like image 193
jordivador Avatar answered Nov 26 '22 12:11

jordivador


Considering that Elasticsearch will index your fields, the document size will not be a big problem for the performance. Using size 0 don't affect the query performance inside Elasticsearch but affect positively the performance to retrieve the document because the network transfer.

If you just want to check one boolean field for a specific document you can simply use Get API to obtain the document just retrieving the field you want to check, like this:

curl -XGET 'http://localhost:9200/my_index/my_type/1000?fields=my_field'

In this case Elasticsearch will just retrieve the document with _id = 1000 and the field my_field. So you can check the boolean value.

{
  "_index": "my_index",
  "_type": "my_type",
  "_id": "1000",
  "_version": 9,
  "found": true,
  "fields": {
    "my_field": [
      true
    ]
  }
}
like image 35
Bruno dos Santos Avatar answered Nov 26 '22 11:11

Bruno dos Santos


By looking at your question I see that you haven't mentioned the elasticsearch version you are using. I would say there are lot of factors that affects the performance of a elasticsearch cluster.

However assuming it is the latest elasticsearch and considering that you are after a single value, the best approach is to change your query in to a non-scoring, filtering query. Filters are quite fast in elasticsearch and very easily cached. Making a query non-scoring avoids the scoring phase entirely(calculating relevance, etc...).

To to this:

GET localhost:9200/test_index/test_partition/_search
{
"query" : {
    "constant_score" : { 
        "filter" : {
            "term" : { 
                "my_field" : True
            }
        }
    }
}

}

Note that we are using the search API. The constant_score is used to convert the term query in to a filter, which should be inherently fast.

For more information. Please refer Finding exact values

like image 28
Sithum Avatar answered Nov 26 '22 10:11

Sithum