Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

elasticsearch vs hbase/hadoop for realtime statistics

I'm loggin millions of small log documents weekly to do:

  • ad hoc queries for data mining
  • joining, comparing, filtering and calculating values
  • many many fulltext-search with python
  • run this operations with all millions of docs, some times every day

My first thought was put all docs in HBase/HDFS and run Hadoop jobs generating stats results.

The problem is: some of results must be near real-time.

So, after some research I discovered ElasticSearch and Now I'm thinking about transfer all millions of documents and use DSL-Queries to generate stats results.

Is this a good idea? ElasticSearch seems to be so easy to handle with millions/billions of documents.

like image 347
user3175226 Avatar asked Feb 26 '14 13:02

user3175226


1 Answers

  • For real-time search Analytics Elastic Search is a good choice.
  • Definitely easier to setup and handle than Hadoop/HBase/HDFS.
  • Elastic-Search vs HBase Good Comparison: http://db-engines.com/en/system/Elasticsearch%3BHBase
like image 64
Jasper Avatar answered Oct 11 '22 12:10

Jasper