Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reasons against using Elasticsearch as an OLAP cube

At first glance, it seems that with Elasticsearch as a backend it is easy and fast to build reports with pivot-like functionality as used in traditional business intelligence environments.

By "pivot-like" I mean that in SQL-terms, data is grouped by one to two dimensions, filtered, ordered by one or two dimensions and aggregated by several metrics e.g. with sum or count.

By "easy" I mean that with a sufficiently large cluster, no pre-aggregation of the data is required, which saves ETLs and data engineering time.

By "fast" I mean that due to Elasticsearch's near real time capability report latency can be reduced in many instances, when compared to traditional business intelligence systems.

Are there any reasons, not to use Elasticsearch for the above purpose?

like image 748
user1091141 Avatar asked Feb 19 '16 19:02

user1091141


People also ask

Is Elasticsearch good for OLAP?

It has been very popular in the OLAP field for the past two years and has been widely used by large Internet enterprises in China. Elasticsearch is an engine tailored for near-real-time distributed search analysis, and its underlying storage is entirely based on Lucene.

Is Elasticsearch OLTP or OLAP?

While Elasticsearch is primarily used as an OLAP database, some teams do use it to power user facing search experiences (think: searching your emails or past tweets). This use case is somewhere in between OLTP and OLAP.

Are OLAP cubes outdated?

OLAP cubes are also becoming outdated in other ways. Businesses across all sectors are demanding more from their reporting and analytics infrastructure within shorter business timeframes. OLAP cubes can't deliver real-time analysis and reporting – something high performing businesses now expect.

Why do we need OLAP cube?

An OLAP cube is a data structure that overcomes the limitations of relational databases by providing rapid analysis of data. Cubes can display and sum large amounts of data while also providing users with searchable access to any data points.


1 Answers

ElasticSearch is a great alternative to a cube, we use it for that same purpose today. One huge benefit is that with a cube you need to know what dimensions you want to create reports on. With ES you just shove in more and more data and figure out later how you want to report on it.

At our company we regularly have data go through the following life cycle.

  1. record is written to SQL
  2. primary key from SQL is written to RabbitMQ
  3. we respond back to the customer very quickly
  4. When Rabbit has time, it uses the primary key to gather up all the data we want to report on
  5. That data is written to ElasticSearch

A word of advice: If you think you might want to report on it, get it from the beginning. Inserting 1M rows into ES is very easy, updating 1M rows is a bigger pain.

like image 64
jhilden Avatar answered Oct 25 '22 00:10

jhilden