Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Very slow elasticsearch term aggregation. How to improve?

We have ~20M (hotel offers) documents stored in elastic(1.6.2) and the point is to group documents by multiple fields (duration, start_date, adults, kids) and select one cheapest offer out of each group. We have to sort those results by cost field.

To avoid sub-aggregations we have united target fields values into one called default_group_field by joining them with dot(.).

Mapping for the field looks like this:

  "default_group_field": {
    "index": "not_analyzed",
    "fielddata": {
      "loading": "eager_global_ordinals"
    },
    "type": "string"
  }

Query we perform looks like this:

{
  "size": 0,
  "aggs": {
    "offers": {
      "terms": {
        "field": "default_group_field",
        "size": 5,
        "order": {
          "min_sort_value": "asc"
        }
      },
      "aggs": {
        "min_sort_value": {
          "min": {
            "field": "cost"
          }
        },
        "cheapest": {
          "top_hits": {
            "_source": {}
            },
            "sort": {
              "cost": "asc"
            },
            "size": 1
          }
        }
      }
    }
  },
  "query": {
    "filtered": {
      "filter": {
        "and": [
          ...
        ]
      }
    }
  }
}

The problem is that such query takes seconds (2-5sec) to load.

However once we perform query without aggregations we get a moderate amount of results (say "total": 490) in under 100ms.

{
  "took": 53,
  "timed_out": false,
  "_shards": {
    "total": 6,
    "successful": 6,
    "failed": 0
  },
  "hits": {
    "total": 490,
    "max_score": 1,
    "hits": [...

But with aggregation it take 2sec :

{
  "took": 2158,
  "timed_out": false,
  "_shards": {
    "total": 6,
    "successful": 6,
    "failed": 0
  },
  "hits": {
    "total": 490,
    "max_score": 0,
    "hits": [

    ]
  },...

It seems like it should not take so long to process that moderate amount filtered documents and select the cheapest one out of every group. It could be done inside application, which seems an ugly hack for me.

The log is full of lines stating:

[DEBUG][index.fielddata.plain ] [Karen Page] [offers] Global-ordinals[default_group_field][2564761] took 2453 ms

That is why we updated our mapping to perform eager global_ordinals rebuild on index update, however this did not make notable impact on query timings.

Is there any way to speedup such aggregation, or maybe a way to tell elastic to do aggregation on filtered documents only.

Or maybe there is another source of such a long query execution? Any ideas highly appreciated!

like image 943
prikha Avatar asked Jun 03 '16 12:06

prikha


People also ask

Is Elasticsearch good for aggregation?

Elasticsearch Aggregations provide you with the ability to group and perform calculations and statistics (such as sums and averages) on your data by using a simple search query. An aggregation can be viewed as a working unit that builds analytical information across a set of documents.

Why is Elasticsearch so slow?

Slow queries are often caused byPoorly configured Elasticsearch clusters or indices. Saturated CPU, Memory, Disk and network resources on the cluster.

Which is used to improve the performance of Elasticsearch?

ElasticSearch is built with an open-source Lucene for high performance. The open-source Apache Lucene is made with Java, ElasticSearch internally uses Apache Lucene for indexing and searching.


1 Answers

thanks again for the effort.

Finally we have solved the main problem and our performance is back to normal.

To be short we have done the following: - updated the mapping for the default_group_field to be of type Long - compressed the default_group_field values so that it would match type Long

Some explanations:

Aggregations on string fields require some work work be done on them. As we see from logs building Global Ordinals for that field that has very wide variance was very expensive. In fact we do only aggregations on the field mentioned. With that said it is not very efficient to use String type.

So we have changed the mapping to:

default_group_field: {
  type: 'long',
  index: 'not_analyzed'
}

This way we do not touch those expensive operations.

After this and the same query timing reduced to ~100ms. It also dropped down CPU usage.

PS 1

I`ve got a lot of info from docs on global ordinals

PS 2

Still I have no idea on how to bypass this issue with the field of type String. Please comment if you have some ideas.

like image 189
prikha Avatar answered Sep 27 '22 17:09

prikha