Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Automatically merge / rollup data in elastic search

Is there an easy way to create a new index from aggregated results from another index (And maybe merge em).

I have a large index with products that are similar. They have a product ID to identify which products belong together, but they have a different URL / Price and a different title (that I want to preserve somehow in the merge so I can search it).

So if I enter 8 product lines I would love to have it all roll up into 1 product with a nested array with similar product data.

I tried the rollup API with the job below. But I couldn't get that going the way I wanted and Im getting the feeling that that is only for historical / log data. All my data has the same timestamp since I update all of this every morning.

PUT _xpack/rollup/job/product
{
  "index_pattern": "products",
  "rollup_index": "products_rollup",
  "cron": "*/30 * * * * ?",
  "page_size": 1000,
  "groups": {
    "date_histogram": {
      "field": "timestamp",
      "interval": "7d"
    },
    "terms": {
      "fields": [
        "product_id"
      ]
    }
  },
  "metrics": [
    {
      "field": "total_price",
      "metrics": [
        "min",
        "max",
        "sum"
      ]
    }
  ]
}

Thanks!

like image 274
Hans Wassink Avatar asked Jul 03 '18 14:07

Hans Wassink


People also ask

What is the rollup feature in Elasticsearch?

The Rollup feature exposes a new search endpoint ( /_rollup_search vs the standard /_search) which knows how to search over rolled-up data. Importantly, this endpoint accepts 100% normal Elasticsearch Query DSL. Your application does not need to learn a new DSL to inspect historical data, it can simply reuse existing queries and dashboards.

What is the Elasticsearch check-up?

The Check-Up will help you optimize important settings in Elasticsearch to improve performance. The cost of running an Elasticsearch cluster is largely relative to the volume of data stored on the cluster.

What drives the Elastic Stack retention periods?

Retention periods are thus driven by financial realities rather than by the usefulness of extensive historical data. The Elastic Stack data rollup features provide a means to summarize and store historical data so that it can still be used for analysis, but at a fraction of the storage cost of raw data.

How do I use the rollup Search API?

The rollup search API has the capability to search across both "live" non-rollup data and the aggregated rollup data. This is done by simply adding the live indices to the URI: The original request is sent to the non-rollup index unaltered.


1 Answers

For now the rollup API is mainly intended to rollup numerical data in time. Not to merge documents. In your case I would merge the documents on application level and get one document with the "subdocuments" in a nested object.

like image 194
David Kooijman Avatar answered Sep 27 '22 17:09

David Kooijman