Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What are common pitfalls of timestamp based syncing?

Tags:

git

ios

cocoa

sync

I am implementing my first syncing code. In my case I will have 2 types of iOS clients per user that will sync records to a server using a lastSyncTimestamp, a 64 bit integer representing the Unix epoch in milliseconds of the last sync. Records can be created on the server or the clients at any time and the records are exchanged as JSON over HTTP.

I am not worried about conflicts as there are few updates and always from the same user. However, I am wondering if there are common things that I need to be aware of that can go wrong with a timestamp based approach such as syncing during daylight savings time, syncs conflicting with another, or other gotchas.

I know that git and some other version control system eschew syncing with timestamps for a content based negotiation syncing approach. I could imagine such an approach for my apps too, where using the uuid or hash of the objects, both peers announce which objects they own, and then exchange them until both peers have the same sets.

If anybody knows any advantages or disadvantages of content-based syncing versus timestamp-based syncing in general that would be helpful as well.

Edit - Here are some of the advantages/disadvantages that I have come up with for timestamp and content based syncing. Please challenge/correct.

Note - I am defining content-based syncing as simple negotiation of 2 sets of objects such as how 2 kids would exchange cards if you gave them each parts of a jumbled up pile of 2 identical sets of baseball cards and told them that as they look through them to announce and hand over any duplicates they found to the other until they both have identical sets.

  • Johnny - "I got this card."
  • Davey - "I got this bunch of cards. Give me that card."
  • Johnny - "Here is your card. Gimme that bunch of cards."
  • Davey - "Here are your bunch of cards."
  • ....
  • Both - "We are done"

Advantages of timestamp-based syncing

  • Easy to implement
  • Single property used for syncing.

Disadvantages of timestamp-based syncing

  • Time is a relative concept to the observer and different machine's clocks can be out of sync. There are a couple ways to solve this. Generate timestamp on a single machine, which doesn't scale well and represents a single point of failure. Or use logical clocks such as vector clocks. For the average developer building their own system, vector clocks might be too complex to implement.
  • Timestamp based syncing works for client to master syncing but doesn't work as well for peer to peer syncing or where syncing can occur with 2 masters.
  • Single point of failure, whatever generates the timestamp.
  • Time is not really related to the content of what is being synced.

Advantages of content-based syncing

  • No per peer timestamp needs to be maintained. 2 peers can start a sync session and start syncing based on the content.
  • Well defined endpoint to sync - when both parties have identical sets.
  • Allows a peer to peer architecture, where any peer can act as client or server, providing they can host an HTTP server.
  • Sync works with the content of the sets, not with an abstract concept time.
  • Since sync is built around content, sync can be used to do content verification if desired. E.g. a SHA-1 hash can be computed on the content and used as the uuid. It can be compared to what is sent during syncing.
  • Even further, SHA-1 hashes can be based on previous hashes to maintain a consistent history of content.

Disadvantages of content-based syncing

  • Extra properties on your objects may be needed to implement.
  • More logic on both sides compared to timestamp based syncing.
  • Slightly more chatty protocol (this could be tuned by syncing content in clusters).
like image 458
John Wright Avatar asked Nov 15 '10 16:11

John Wright


People also ask

What is time synchronization and how does it work?

The time server maintains its clock by using a radio clock or other accurate time source, then all other computers in the system stay synchronized with it. A time client will maintain its clock by making a procedure call to the time server.

Why is clock synchronization important?

Time synchronized networks enable accurate time stamping by each of the computers on the network. This is important to properly sort events and transactions into chronological order so that any disturbances or problems in the data can easily be detected and resolved.

What are the benefits of data synchronization?

What are the benefits of data synchronization? Data synchronization prevents data conflicts, which can result in errors and low-quality, low-trust data. Synchronized, trustworthy data is essential for security, compliance, and a wide variety of operational functions.

What is real time data synchronization?

Data synchronization, or data sync, is the continual process of keeping a record type identical between two or more systems. This can be done in real time, in near real time, or in batches.


1 Answers

Part of the problem is that time is not an absolute concept. Whether something happens before or after something else is a matter of perspective, not of compliance with a wall clock.

Read up a bit on relativity of simultaneity to understand why people have stopped trying to use wall time for figuring these things out and have moved to constructs that represent actual causality using vector clocks (or at least Lamport clocks).

If you want to use a clock for synchronization, a logical clock will likely suit you best. You will avoid all of your clock sync issues and stuff.

like image 129
Dustin Avatar answered Oct 05 '22 23:10

Dustin