I have one table that records its row insert/update timestamps on a field. I want to synchronize data in this table with another table on another db server. Two db servers are not connected and synchronization is one way (master/slave). Using table triggers is not suitable My workflow: <ul> <li>I use a global last_sync_date parameter and query table Master for the changed/inserted records </li> <li>Output the resulting rows to xml </li> <li>Parse the xml and update table Slave using updates and inserts</li> </ul> The complexity of the problem rises when dealing with deleted records of Master table. To catch the deleted records I think I have to maintain a log table for the previously inserted records and use sql "NOT IN". This becomes a performance problem when dealing with large datasets. What would be an alternative workflow dealing with this scenario?

It sounds like you need a transactional message queue. How this works is simple. When you update the master db you can send a message to the message broker (of whatever the update was) which can go to any number of queues. Each slave db can have its own queue and because queue's preserve order the process should eventually synchronize correctly (ironically this is sort of how most RDBMS do replication internally). Think of the Message Queue as a sort of SCM change-list or patch-list database. That is for the most part the same (or roughly the same) SQL statements sent to master should be replicated to the other databases eventually. Don't worry about loosing messages as most message queues support durability and transactions. I recommend you look at spring-amqp and/or spring-integration especially since you tagged this question with spring-batch. Based on your comments: <ul> <li>See Spring Integration: http://static.springsource.org/spring-integration/reference/htmlsingle/ . </li> <li>Google SEDA. Whether you go this route or not you should know about Message queues as it goes hand-in-hand with batch processing.</li> <li>RabbitMQ has a good picture diagram of how messaging works</li> <li>The contents of your message might be the entire row and whether its a CRUD, UPDATE, DELETE. You can use whatever format (e.g. JSON. See spring integration on recommendations). <ul> <li>You could even send the direct SQL statements as a message!</li> </ul> </li> </ul> BTW your concern of <code>NOT IN</code> being a performance problem is not a very good one as there are a plethora of work-arounds but given your not wanting to do DB specific things (like triggers and replication) I still feel a message queue is your best option. EDIT - Non MQ route Since I gave you a tough time about asking this quesiton I will continue to try to help. Besides the message queue you can do some sort of XML file like you we were trying before. THE CRITICAL FEATURE you need in the schema is a CREATE TIMESTAMP column on your master database so that you can do the batch processing while the system is up and running (otherwise you will have to stop the system). Now if you go this route you will want to <code>SELECT * WHERE CREATE_TIME < ?</code> is less than the current time. Basically your only getting the rows at a snapshot. Now on your other database for the delete your going to remove rows by <code>inner joining</code> on a ID table but with <code>!=</code> (that is you can use JOINS instead of slow <code>NOT IN</code>). Luckily you only need all the <code>ids</code> for delete and not the other columns. The other columns you can use a delta based on the the update time stamp column (for update, and create aka insert).

Synchronizing table data across databases

Tags:

java

sql

spring-batch

I have one table that records its row insert/update timestamps on a field.

I want to synchronize data in this table with another table on another db server. Two db servers are not connected and synchronization is one way (master/slave). Using table triggers is not suitable

My workflow:

I use a global last_sync_date parameter and query table Master for the changed/inserted records
Output the resulting rows to xml
Parse the xml and update table Slave using updates and inserts

The complexity of the problem rises when dealing with deleted records of Master table. To catch the deleted records I think I have to maintain a log table for the previously inserted records and use sql "NOT IN". This becomes a performance problem when dealing with large datasets.

What would be an alternative workflow dealing with this scenario?

358

asked Mar 05 '13 11:03

Serkan Arıkuşu

2 Answers

It sounds like you need a transactional message queue.

How this works is simple. When you update the master db you can send a message to the message broker (of whatever the update was) which can go to any number of queues. Each slave db can have its own queue and because queue's preserve order the process should eventually synchronize correctly (ironically this is sort of how most RDBMS do replication internally).

Think of the Message Queue as a sort of SCM change-list or patch-list database. That is for the most part the same (or roughly the same) SQL statements sent to master should be replicated to the other databases eventually. Don't worry about loosing messages as most message queues support durability and transactions.

I recommend you look at spring-amqp and/or spring-integration especially since you tagged this question with spring-batch.

Based on your comments:

See Spring Integration: http://static.springsource.org/spring-integration/reference/htmlsingle/ .
Google SEDA. Whether you go this route or not you should know about Message queues as it goes hand-in-hand with batch processing.
RabbitMQ has a good picture diagram of how messaging works
The contents of your message might be the entire row and whether its a CRUD, UPDATE, DELETE. You can use whatever format (e.g. JSON. See spring integration on recommendations).
- You could even send the direct SQL statements as a message!

BTW your concern of NOT IN being a performance problem is not a very good one as there are a plethora of work-arounds but given your not wanting to do DB specific things (like triggers and replication) I still feel a message queue is your best option.

EDIT - Non MQ route

Since I gave you a tough time about asking this quesiton I will continue to try to help. Besides the message queue you can do some sort of XML file like you we were trying before. THE CRITICAL FEATURE you need in the schema is a CREATE TIMESTAMP column on your master database so that you can do the batch processing while the system is up and running (otherwise you will have to stop the system). Now if you go this route you will want to SELECT * WHERE CREATE_TIME < ? is less than the current time. Basically your only getting the rows at a snapshot.

Now on your other database for the delete your going to remove rows by inner joining on a ID table but with != (that is you can use JOINS instead of slow NOT IN). Luckily you only need all the ids for delete and not the other columns. The other columns you can use a delta based on the the update time stamp column (for update, and create aka insert).

198

answered Oct 05 '22 22:10

Adam Gent

I am not sure about the solution. But I hope these links may help you.

http://knowledgebase.apexsql.com/2007/09/how-to-synchronize-data-between.htm

http://www.codeproject.com/Tips/348386/Copy-Synchronize-Table-Data-between-databases

answered Oct 05 '22 22:10

Shailesh Saxena

Related questions
                            
                                Smoothing a jagged path
                            
                                is it possible to use apache mahout without hadoop dependency?
                            
                                Java - return or if-else [duplicate]
                            
                                How to read properties from xml file with java?
                            
                                split method of class String ignores semicolon separators [duplicate]
                            
                                How to set the Tab Order in Swing Java?
                            
                                Is there a Map implementation with listeners for Java?
                            
                                How to instantiate android service with a constructor?
                            
                                What is the convention for instantiating collections of user defined types?
                            
                                ThreadLocal usage in enterprise application
                            
                                javax.crypto.BadPaddingException:Given final block not properly padded
                            
                                How to copy a file on the FTP server to a directory on the same server in Java?
                            
                                Java Double Comparison [duplicate]
                            
                                How does maven compile only the modified java files?
                            
                                Apache HTTPClient SSLPeerUnverifiedException
                            
                                Hibernate validation annotation - validate that at least one field is not null
                            
                                How to set a Java thread's cpu core affinity?
                            
                                guava cache vs ehcache benchmark [closed]
                            
                                Creating an Adapter to a CustomView
                            
                                How to create instances on the fly in CDI

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With