Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Database needed with elasticsearch?

I've been doing a lot of research in regards to elasticsearch and I seem to be stumbling on the question of whether or not a database is needed.

Current Hibernate-Search and Relational Design

My current application is written in java using hibernate, hibernate-search, and a mysql database. Hibernate search is built on lucene and automatically manages my indexes for me during database transactions. Hibernate-search will also search against the index and then pull full records from the database based on the stored pks rather than having to store your entire data model in the index. This has worked wonderfully, however as my application grows, I've continually run into scaling issues and cost do to the fact the Lucene indexes need to live on each application server and then you need another library to sync the indexes together. The other issue with this design is it requires more memory on all the application servers since the indexes are being replicated and stored with the application.

Database or No Database

Coming from the hibernate-search school of thought, I'm confused on whether or not your suppose to store your entire data model in elasticsearch and do away with the traditional database or if your suppose to store your search data in the indexes and again like hibernate-search return primary keys to pull complete records from your relational database.

Managing the Indexes

  1. If your using the indexes with a a db, should you be manually maintaining them during transactions? I seen a jdbc project called river, but it looks to be deprecated and not recommended for production use, is there a library out there capable of automatically handling your transactions for you?
  2. If your indexes fall out of sync with your db, is there a recommended way to rebuild them?

Hibernate-Search API

I also seen the following in the hibernate-search roadmap API / SPI for alternative backends http://hibernate.org/search/roadmap/

Define API / SPI abstraction to allow for future external backends integrations such as Apache Solr and Elastic Search.

I'm wondering if anybody has any input on this? Is hibernate-search capable of managing the elastic search indexes automatically for you just as it does with it's native configuration?

If No Database

What would be the drawback of not using a database for anything search related?

like image 499
Code Junkie Avatar asked Apr 17 '15 15:04

Code Junkie


People also ask

Is Elasticsearch SQL or NoSQL?

Since its release in 2010, Elasticsearch has become one of the world's top ten databases by popularity. Originally based on Apache's Lucene search engine, it remains an open-source product, built using Java, and storing data in an unstructured NoSQL format.

Does Elasticsearch use MongoDB?

Integrate ElasticSearch and MongoDB. MongoDB is used for storage, and ElasticSearch is used to perform full-text indexing over the data. Hence, the combination of MongoDB for storing and ElasticSearch for indexing is a common architecture that many organizations follow.

Does Elasticsearch use SQL?

Elasticsearch has the speed, scale, and flexibility your data needs — and it speaks SQL. Use traditional database syntax to unlock non-traditional performance, like full text search across petabytes of data with real-time results.

Is Elasticsearch structured database or unstructured database?

Elasticsearch (ES) is a document-oriented search engine, designed to store, retrieve and manage document-oriented, structured, unstructured, and semi-structured data. Elasticsearch uses Lucene StandardAnalyzer for indexing for automatic type guessing and more precision.


1 Answers

I faced a similar problem before, on a elasticsearch setup with a mysql with the data. The solution was to store only the data that was needed to be searched on elasticsearch, with a reference to the relational database. If the data on elasticsearch was enough for the request, I returned only the elasticsearch record. If it wasn't I went to the relational database and returned that record instead.

I divided in these two processes because of the lag that the relational database introduced (it was an API for a high demand web service, elasticsearch was faster). That introduced a synchronization problem, but that was not critical on my application and we pulled periodically the data from the relational db and reindexed only the changed data set on elasticsearch. Elasticsearch can reindex only a subset of records.

We considered not using a db and storing everything in the search engine, but it depends on the importance of your data. If you can't risk losing any part of your data, don't store only on elasticsearch. We always considered the data in elasticsearch as perishable and that it the search indexes could be reconstructed from the database.

like image 85
Ivan Avatar answered Sep 21 '22 13:09

Ivan