Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Something "Materialized view"-like in ElasticSearch

I have a query which runs every time a website is loaded. This Query aggregates over three different term-fields and around 3 million documents and therefore needs 6-7 seconds to complete. The data does not change that frequently and the currentness of the result is not critical.

I know that I can use an alias to create something "View" like in the RDMS world. Is it also possible to populate it, so the query result gets cached? Is there any other way caching might help in this scenario or do I have to create an additional index for the aggregated data and update it from time to time?

like image 215
nik Avatar asked Jan 03 '17 10:01

nik


People also ask

Why use materialized view instead of a view?

A materialized view is much more efficient at executing queries. The data is physically saved at a specific point in time. You don't need to re-read all the data associated with a query every single time. The drawback is that you have to make sure to view the most recent data.

What is materialized view with example?

For example, let's say you have a database with two tables: one contains the number of employees in your business, and the other contains the number of departments in your business. Using a materialized view, you could query the database to retrieve all the employees who are associated with a particular department.

Is materialized view faster than table?

A materialized view pre-computes, stores, and maintains its data in a dedicated SQL pool just like a table. There's no recomputation needed each time a materialized view is used. That's why queries that use all or a subset of the data in materialized views can get faster performance.


1 Answers

I know that the post is old, but about view, elastic add the Data frames in the 7.3.0. You could also use the _reindex api

POST /_reindex
{
  "source": {
    "index": "live_index"
  },
  "dest": {
    "index": "caching_index"
  }
}

But it will not change your ingestion problem. About this, I think the solution is sharding for your index. with 2 or more shards, and several nodes, elastic will be able to paralyze.

But an easier thing to test is to disable the refresh_interval when indexing and to re-enable it after. It generally improve a lot the ingestion time.

You can see a full article on this use case on https://www.elastic.co/guide/en/elasticsearch/reference/current/tune-for-indexing-speed.html

like image 80
Jaycreation Avatar answered Dec 27 '22 13:12

Jaycreation