I want to use elasticsearch-river-mysql in order to continuously transfer data from MySQL database to ElasticSearch. I'm beginner with ES and rivers so I hope you can help me out with my questions.
In this method, you can connect your existing MySQL Database with Elasticsearch and perform CRUD queries. Moreover, you will also see how to perform a search from the Elasticsearch Database. This will be done using the Elasticsearch APIs.
My advice already is to try to use the elasticsearch-jdbc-river for many reasons.
One of them is that the elasticsearch-jbdc-river
is more generic in case you decide to switch RDBMS.
Another is that the jbdc-river
is still maintained when the other one hasn't been since 2 years, and Elasticsearch evolved a lot ever since.
1. From what I know, the data will be streamed from the MySQL database to the ES cluster which will index it automatically. Is that correct? Are there any timeouts or limits I have to be aware of?
The data from MySQL should be streamed automatically from MySQL to the Elasticsearch cluster without a timeout limitation but the bottleneck will be your JVM Heap Size. I'm not sure how much do you need to process the amount of data you have. You need to test it.
2. How the foreign key relations between the relational database tables will be translated into ES? Will the table row containing the foreign key become an inner object for an ES document or some other relation between the ES documents will be used?
Elasticsearch is schemaless so you need to manage to the inside Elasticsearch. The river just streams the data into your cluster. You can define your mapping when you create your index and then use the river to stream it into the ES cluster.
3. Are there any disadvantages in using this river for the mentioned above purpose?
The river will be replaced with another cleaner way to stream these data but this is the best solution you have for now.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With