Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Synchronizing table data across databases

I have one table that records its row insert/update timestamps on a field.

I want to synchronize data in this table with another table on another db server. Two db servers are not connected and synchronization is one way (master/slave). Using table triggers is not suitable

My workflow:

  • I use a global last_sync_date parameter and query table Master for the changed/inserted records
  • Output the resulting rows to xml
  • Parse the xml and update table Slave using updates and inserts

The complexity of the problem rises when dealing with deleted records of Master table. To catch the deleted records I think I have to maintain a log table for the previously inserted records and use sql "NOT IN". This becomes a performance problem when dealing with large datasets.

What would be an alternative workflow dealing with this scenario?

like image 358
Serkan Arıkuşu Avatar asked Mar 05 '13 11:03

Serkan Arıkuşu


People also ask

How do you sync tables in different databases?

Specifically, you can create a batch sync node, specify tables in multiple databases as the source tables, and then specify the destination table. After that, you can run the batch sync node to synchronize the data from the source tables to the destination table.

What is the technique of data synchronization?

Data synchronization is the ongoing process of synchronizing data between two or more devices and updating changes automatically between them to maintain consistency within systems. While the sheer quantity of data afforded by the cloud presents challenges, it also provides the perfect solution for big data.


2 Answers

It sounds like you need a transactional message queue.

How this works is simple. When you update the master db you can send a message to the message broker (of whatever the update was) which can go to any number of queues. Each slave db can have its own queue and because queue's preserve order the process should eventually synchronize correctly (ironically this is sort of how most RDBMS do replication internally).

Think of the Message Queue as a sort of SCM change-list or patch-list database. That is for the most part the same (or roughly the same) SQL statements sent to master should be replicated to the other databases eventually. Don't worry about loosing messages as most message queues support durability and transactions.

I recommend you look at spring-amqp and/or spring-integration especially since you tagged this question with spring-batch.

Based on your comments:

  • See Spring Integration: http://static.springsource.org/spring-integration/reference/htmlsingle/ .
  • Google SEDA. Whether you go this route or not you should know about Message queues as it goes hand-in-hand with batch processing.
  • RabbitMQ has a good picture diagram of how messaging works
  • The contents of your message might be the entire row and whether its a CRUD, UPDATE, DELETE. You can use whatever format (e.g. JSON. See spring integration on recommendations).
    • You could even send the direct SQL statements as a message!

BTW your concern of NOT IN being a performance problem is not a very good one as there are a plethora of work-arounds but given your not wanting to do DB specific things (like triggers and replication) I still feel a message queue is your best option.

EDIT - Non MQ route

Since I gave you a tough time about asking this quesiton I will continue to try to help. Besides the message queue you can do some sort of XML file like you we were trying before. THE CRITICAL FEATURE you need in the schema is a CREATE TIMESTAMP column on your master database so that you can do the batch processing while the system is up and running (otherwise you will have to stop the system). Now if you go this route you will want to SELECT * WHERE CREATE_TIME < ? is less than the current time. Basically your only getting the rows at a snapshot.

Now on your other database for the delete your going to remove rows by inner joining on a ID table but with != (that is you can use JOINS instead of slow NOT IN). Luckily you only need all the ids for delete and not the other columns. The other columns you can use a delta based on the the update time stamp column (for update, and create aka insert).

like image 198
Adam Gent Avatar answered Oct 05 '22 22:10

Adam Gent


I am not sure about the solution. But I hope these links may help you.

http://knowledgebase.apexsql.com/2007/09/how-to-synchronize-data-between.htm

http://www.codeproject.com/Tips/348386/Copy-Synchronize-Table-Data-between-databases

like image 21
Shailesh Saxena Avatar answered Oct 05 '22 22:10

Shailesh Saxena