Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Implementation of set reconciliation algorithm

I'm looking for implementations of set reconciliation algorithm. The problem is following: there are two sets with elements identified by some relatively compact value (e.g. UUID or MD5/SHA1/whatever hash) sitting on different machines. These sets differ in relatively few elements and I want to synchronize these sets while transferring minimal amount of data. Most of googling leads here. This is GPL'd implementation of what seems to be the state-of-art approach to the task. The problem is that I can't use GPL'd code in my app. Most likely I'll have to reimplement it myself using something like nzmath, but maybe there are other implementations (preferably Python or C/C++), or maybe there are other nicer algorithms?

like image 339
fionbio Avatar asked Sep 21 '09 21:09

fionbio


2 Answers

Not being able to use GPL is often a matter of abstraction; that is if it is the license you have problems with. So if you create a small GPL application (released under GPL) you can call this from your non-GPL application. Why re-invent the wheel?

Especially if you can use a python script which already exists: why not leverage it? Of course things are different if you can not expose the element reconsolidation algorithms.

like image 131
Adriaan Avatar answered Nov 04 '22 11:11

Adriaan


This code is out of my head, and thus covered by whatever license applies for code samples in this site.

# given two finite sequences of unique and hashable data,
# return needed opcodes and data needed for reconciliation

def set_reconcile(src_seq, dst_seq):
    "Return required operations to mutate src_seq into dst_seq"
    src_set= set(src_seq) # no-op if already of type set
    dst_set= set(dst_seq) # ditto

    for item in src_set - dst_set:
        yield 'delete', item

    for item in dst_set - src_set:
        yield 'create', item

Use as follows:

for opcode, datum in set_reconcile(machine1_stuff, machine2_stuff):
    if opcode == 'create':
        # act accordingly
    elif opcode == 'delete':
        # likewise
    else:
        raise RuntimeError, 'unexpected opcode'
like image 1
tzot Avatar answered Nov 04 '22 13:11

tzot