A remote CMIS repository contains many folders/files.
I am writing a software that keeps a local copy of these folders/files in sync.
What is the most efficient way to check the remote changes?
(additional/removal of files/folders)
Most efficient = Least bandwidth usage.
I can only use the CMIS protocol, and I can not run any custom software on the remote server.
My ideas so far:
Any other ideas?
I don't know the CMIS protocol much, there might be something more convenient.
A more ideal version of idea 3 is easily accomplished according to some digging through the CMIS protocol you posted.
2.1.11 Change Log
CMIS provides a “change log” mechanism to allow applications to easily discover the set of changes that have occurred to objects stored in the repository since a previous point in time. This change log can then be used by applications such as search services that maintain an external index of the repository to efficiently determine how to synchronize their index to the current state of the repository (rather than having to query for all objects currently in the repository).
Entries recorded in the change log are referred to below as “change events”.
Note that change events in the change log MUST be returned in ascending order from the time when the change event occurred.
Using whatever tools of your choice, you should be able to do an initial pull of the entire repository and save the time the pull was performed. Subsequent queries to the repository (at an interval of your choosing) are done with the following procedure:
Using the repository's change log is the right way to go, but realize that not every repository supports this. For example, for Alfresco you must configure the audit sub-system and you must set audit.cmischangelog.enabled=true in alfresco-global.properties.
To find out if your repo supports changes you can look as the results of the repository's getCapabilities response. If you see 'Changes' set to 'None' then your repository doesn't support change logs.
Assuming it does, you need to ask the repository for its latest change log token. You can get that from getRepositoryInfo. Save that before you call getContentChanges. Then, on the next call, pass in the token. You'll get the changes made since the token was issued.
So, your code needs to:
I have a "cmis-sync" script that does one-way synchronization using this approach implemented in Python. I've tested it against Alfresco as the source and the OpenCMIS InMemory repository as the target. If there is interest I can make it available.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With