I'm looking for implementations of set reconciliation algorithm. The problem is following: there are two sets with elements identified by some relatively compact value (e.g. UUID or MD5/SHA1/whatever hash) sitting on different machines. These sets differ in relatively few elements and I want to synchronize these sets while transferring minimal amount of data. Most of googling leads here. This is GPL'd implementation of what seems to be the state-of-art approach to the task. The problem is that I can't use GPL'd code in my app. Most likely I'll have to reimplement it myself using something like nzmath, but maybe there are other implementations (preferably Python or C/C++), or maybe there are other nicer algorithms?
Not being able to use GPL is often a matter of abstraction; that is if it is the license you have problems with. So if you create a small GPL application (released under GPL) you can call this from your non-GPL application. Why re-invent the wheel?
Especially if you can use a python script which already exists: why not leverage it? Of course things are different if you can not expose the element reconsolidation algorithms.
This code is out of my head, and thus covered by whatever license applies for code samples in this site.
# given two finite sequences of unique and hashable data,
# return needed opcodes and data needed for reconciliation
def set_reconcile(src_seq, dst_seq):
"Return required operations to mutate src_seq into dst_seq"
src_set= set(src_seq) # no-op if already of type set
dst_set= set(dst_seq) # ditto
for item in src_set - dst_set:
yield 'delete', item
for item in dst_set - src_set:
yield 'create', item
Use as follows:
for opcode, datum in set_reconcile(machine1_stuff, machine2_stuff):
if opcode == 'create':
# act accordingly
elif opcode == 'delete':
# likewise
else:
raise RuntimeError, 'unexpected opcode'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With