Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Keeping elasticsearch and database in sync

I am trying to figure out a way to keep my mysql db and elasticsearch db in sync. I have setup a jdbc river using the jprante / elasticsearch-river-jdbc plugin for elasticsearch. When I execute the below request:

curl -XPUT 'localhost:9200/_river/my_jdbc_river/_meta' -d '{
"type" : "jdbc",
"jdbc" : {
    "driver" : "com.mysql.jdbc.Driver",
    "url" : "jdbc:mysql://localhost:3306/MY-DATABASE",
    "user" : "root",
    "password" : "password",
    "sql" : "select * from users",
    "poll" : "1m"
},
"index" : {
    "index" : "test_index",
    "type" : "user"
}
}'

the river starts indexing data, but for some records I get org.elasticsearch.index.mapper.MapperParsingException. Well there is discussion related to this issue here, but I want to know a way to get around this issue.

Is it possible to permanently fix this by creating an explicit mapping for all 'fields' of the 'type' that I am trying to index or is there a better way to solve this issue?

Another question that I have is, when the jdbc-river polls the database again, it seems to re-index the entire data-set(given in sql query) again into ES. I am not sure, but is this done because elasticsearch wants to add fresh data as well as update any changes in the existing data? Is it possible to index only the fresh data, if the table's data is static?

like image 620
serpent403 Avatar asked Oct 03 '12 12:10

serpent403


1 Answers

Did you look at default mapping? http://www.elasticsearch.org/guide/reference/mapping/dynamic-mapping.html

I think it can help you here.

If you have an insertion date field in your datatable, you can use it to filter what you have to index. See https://github.com/jprante/elasticsearch-river-jdbc#time-based-selecting

HTH

David

like image 183
dadoonet Avatar answered Sep 24 '22 21:09

dadoonet