I'd like to sync a large list of items between the client and the server. Since the list is pretty big I can't sync it in a single request so, how can I ensure the list to be synched with a reasonable amount of calls to the synchronization service?
For example:
And I want to sync a list with 100.000 items so I make a web service with the following signature
getItems(int offset,int quantity): Item[]
The problem comes when, between call and call, the list is modified. For example:
getItems(0,100) : Return items (in the original list) [0,100)
getItems(100,100): Return items (in the original list) [100,200)
##### before the next call the items 0-100 are removed ####
getItems(200,100): Return items (in the original list) [300,400)
So the items [200,300) are never retrieved. (Duplicated items can also be retrieved if items are added instead of removed.
How can I ensure a correct sync of this list?
From time to time, the service should save immutable snapshots. The interface should be getItems(long snapshotNumber, int offset,int quantity)
to save time, space, and traffic, not every modification of the list should form a snapshot, but every modification should form a log message (e.g. add items, remove range of items), and that log messages should be send to the client instead of full snapshots. Interface can be getModification(long snapshotNumber, int modificationNumber):Modification
.
Can you make the list ordered on some parameter on the server side? For e.g. a real world use-case for this scenario is showing records in a table on UI. The number of records on the server side can be huge so you wouldn't want to get the whole list at once and instead you get them on each scroll that the user makes.
In this case, if the list is ordered, you get a lot of things for free. And your API becomes getItems(long lastRecordId,int quantity)
. Here lastRecordId
would be a unique key identifying that particular record. You use this key to calculate the offset (on the server side) and retrieve the next batch from this offset location and return the recordId
of the last record to the client which it uses in its next API call.
You don't have to maintain snapshots and there wouldn't be any duplicate records retrieved. The scenarios that you mention in case of removals/insertions don't occur in this case. But at some point in time, you would have to discard the copy that the client has and start syncing all over again if you want to track additions and removals on the client side for the data that the client has already seen.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With