I am trying to figure out a way to keep my mysql db and elasticsearch db in sync. I have setup a jdbc river using the jprante / elasticsearch-river-jdbc plugin for elasticsearch. When I execute the below request:
curl -XPUT 'localhost:9200/_river/my_jdbc_river/_meta' -d '{
"type" : "jdbc",
"jdbc" : {
"driver" : "com.mysql.jdbc.Driver",
"url" : "jdbc:mysql://localhost:3306/MY-DATABASE",
"user" : "root",
"password" : "password",
"sql" : "select * from users",
"poll" : "1m"
},
"index" : {
"index" : "test_index",
"type" : "user"
}
}'
the river starts indexing data, but for some records I get org.elasticsearch.index.mapper.MapperParsingException
. Well there is discussion related to this issue here, but I want to know a way to get around this issue.
Is it possible to permanently fix this by creating an explicit mapping for all 'fields' of the 'type' that I am trying to index or is there a better way to solve this issue?
Another question that I have is, when the jdbc-river polls the database again, it seems to re-index the entire data-set(given in sql query) again into ES. I am not sure, but is this done because elasticsearch wants to add fresh data as well as update any changes in the existing data? Is it possible to index only the fresh data, if the table's data is static?
Did you look at default mapping? http://www.elasticsearch.org/guide/reference/mapping/dynamic-mapping.html
I think it can help you here.
If you have an insertion date field in your datatable, you can use it to filter what you have to index. See https://github.com/jprante/elasticsearch-river-jdbc#time-based-selecting
HTH
David
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With