Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to keep normalized models when searching via ElasticSearch?

When setting up a MySQL / ElasticSearch combo, is it better to:

  1. Completely sync all model information to ES (even the non-search data), so that when a result is found, I have all its information handy.

  2. Only sync the searchable fields, and then when I get the results back, use the id field to find the actual data in the MySQL database?

like image 534
johnnietheblack Avatar asked Jul 23 '15 22:07

johnnietheblack


3 Answers

The Elasticsearch model of data prefers non-normalized data, usually. Depending on the use case (large amount of data, underpowered machines, too few nodes etc) keeping relationships in ES (parent-child) to mimic the inner joins and the like from the RDB world is expensive.

Your question is very open-ended and the answer depends on the use-case. Generally speaking:

  • avoid mimicking the exact DB Tables - ES indices plus their relationships
  • advantage of keeping everything in ES is that you don't need to update both mechanisms at the same time
  • if your search-able data is very small compared to the overall amount of data, I don't see why you couldn't synchronize just the search-able data with ES
  • try to flatten the data in ES and resist any impulse of using parent/child just because this is how it's done in MySQL
  • I'm not saying you cannot use parent/child. You can, but make sure you test this before adopting this approach and make sure you are ok with the response times. This is, anyway, a valid advice for any kind of approach you choose.
like image 166
Andrei Stefan Avatar answered Nov 10 '22 17:11

Andrei Stefan


ElasticSearch is a search engine. I would advise you to not use it as a database system. I suggest you to only index the search data and a unique id from your database so that you can retrieve the results from MySQL using the unique key returned by ElasticSearch. This way you'll be using both applications for what they're intended. Elastic search is not the best for querying relations and you'll have to write lot more code for operating on related data than simply using MySql for it.

Also, you don't want to tie up your persistence layer with search layer. These should be as independent as possible, and change in one should not affect the other, as much as possible. Otherwise, you'll have to update both your systems if either has to change. Querying MySQL on some IDs is very fast, so you can use it and leave the slow part (querying on full text) to elastic search.

like image 2
Archit Saxena Avatar answered Nov 10 '22 16:11

Archit Saxena


Although it's depend on situation, I would suggest you to go with #2:

  • Faster when indexing: we only fetch searchable data from DB and index to ES, compare to fetch all and index all
  • Smaller storage size: since indexed data is smaller than #1, it's more easier to backup, restore, recover, upgrade your ES in production. It'll also keep your storage size small when your data growing up, and you can also consider to use SSD to enhance performance with lower cost.
  • In general, a search app will search on some fields and show all possible data to user. E.g searching for products but will show pricing/stock info.. in result page, which only available in DB. So it's nature to have a 2nd step to query for extra info in DB and combine it with search results to display.

Hope it help.

like image 1
Duc.Duong Avatar answered Nov 10 '22 17:11

Duc.Duong