I'm looking for some general strategies for synchronizing data on a central server with client applications that are not always online.
In my particular case, I have an android phone application with an sqlite database and a PHP web application with a MySQL database.
Users will be able to add and edit information on the phone application and on the web application. I need to make sure that changes made one place are reflected everywhere even when the phone is not able to immediately communicate with the server.
I am not concerned with how to transfer data from the phone to the server or vice versa. I'm mentioning my particular technologies only because I cannot use, for example, the replication features available to MySQL.
I know that the client-server data synchronization problem has been around for a long, long time and would like information - articles, books, advice, etc - about patterns for handling the problem. I'd like to know about general strategies for dealing with synchronization to compare strengths, weaknesses and trade-offs.
Database synchronization establishes data consistency between two or more databases, automatically copying changes back and forth. Harmonization of the data over time should be performed continuously.
I would recommend that you have a timestamp column in every table and every time you insert or update, update the timestamp value of each affected row. Then, you iterate over all tables checking if the timestamp is newer than the one you have in the destination database. If it´s newer, then check if you have to insert or update.
Observation 1: be aware of physical deletes since the rows are deleted from source db and you have to do the same at the server db. You can solve this avoiding physical deletes or logging every deletes in a table with timestamps. Something like this: DeletedRows = (id, table_name, pk_column, pk_column_value, timestamp)
So, you have to read all the new rows of DeletedRows table and execute a delete at the server using table_name, pk_column and pk_column_value.
Observation 2: be aware of FK since inserting data in a table that´s related to another table could fail. You should deactivate every FK before data synchronization.
The first thing you have to decide is a general policy about which side is considered "authoritative" in case of conflicting changes.
I.e.: suppose Record #125 is changed on the server on January 5th at 10pm and the same record is changed on one of the phones (let's call it Client A) on January 5th at 11pm. Last synch was on Jan 3rd. Then the user reconnects on, say, January 8th.
Identifying what needs to be changed is "easy" in the sense that both the client and the server know the date of the last synch, so anything created or updated (see below for more on this) since the last synch needs to be reconciled.
So, suppose that the only changed record is #125. You either decide that one of the two automatically "wins" and overwrites the other, or you need to support a reconcile phase where a user can decide which version (server or client) is the correct one, overwriting the other.
This decision is extremely important and you must weight the "role" of the clients. Especially if there is a potential conflict not only between client and server, but in case different clients can change the same record(s).
[Assuming that #125 can be modified by a second client (Client B) there is a chance that Client B, which hasn't synched yet, will provide yet another version of the same record, making the previous conflict resolution moot]
Regarding the "created or updated" point above... how can you properly identify a record if it has been originated on one of the clients (assuming this makes sense in your problem domain)? Let's suppose your app manages a list of business contacts. If Client A says you have to add a newly created John Smith, and the server has a John Smith created yesterday by Client D... do you create two records because you cannot be certain that they aren't different persons? Will you ask the user to reconcile this conflict too?
Do clients have "ownership" of a subset of data? I.e. if Client B is setup to be the "authority" on data for Area #5 can Client A modify/create records for Area #5 or not? (This would make some conflict resolution easier, but may prove unfeasible for your situation).
To sum it up the main problems are:
Bibliography:
More on this, of course, on Wikipedia.
A simple synchronization algorithm by the author of Vdirsyncer
OBJC article on data synch
SyncML®: Synchronizing and Managing Your Mobile Data (Book on O'Reilly Safari)
Conflict-free Replicated Data Types
Optimistic Replication YASUSHI SAITO (HP Laboratories) and MARC SHAPIRO (Microsoft Research Ltd.) - ACM Computing Surveys, Vol. V, No. N, 3 2005.
Alexander Traud, Juergen Nagler-Ihlein, Frank Kargl, and Michael Weber. 2008. Cyclic Data Synchronization through Reusing SyncML. In Proceedings of the The Ninth International Conference on Mobile Data Management (MDM '08). IEEE Computer Society, Washington, DC, USA, 165-172. DOI=10.1109/MDM.2008.10 http://dx.doi.org/10.1109/MDM.2008.10
Lam, F., Lam, N., and Wong, R. 2002. Efficient synchronization for mobile XML data. In Proceedings of the Eleventh international Conference on information and Knowledge Management (McLean, Virginia, USA, November 04 - 09, 2002). CIKM '02. ACM, New York, NY, 153-160. DOI= http://doi.acm.org/10.1145/584792.584820
Cunha, P. R. and Maibaum, T. S. 1981. Resource &equil; abstract data type + synchronization - A methodology for message oriented programming -. In Proceedings of the 5th international Conference on Software Engineering (San Diego, California, United States, March 09 - 12, 1981). International Conference on Software Engineering. IEEE Press, Piscataway, NJ, 263-272.
(The last three are from the ACM digital library, no idea if you are a member or if you can get those through other channels).
From the Dr.Dobbs site:
From arxiv.org:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With