Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fuse tuples to find equivalence classes

Tags:

algorithm

Suppose we have a finite domain D={d1,..dk} containg k elements.

We consider S a subset of D^n, i.e. a set of tuples of the form < a1,..,an >, with ai in D.

We want to represent it (compactly) using S' a subset of 2^D^n, i.e. a set of tuples of the form < A1,..An > with Ai being subsets of D. The implication is that for any tuple s' in S' all elements in the cross product of Ai exist in S.

For instance, consider D={a,b,c} so k=3, n=2 and the tuples S=< a,b >+< a,c >+< b,b >+< b,c >.

We can use S'=<{a,b},{b,c}> to represent S.

This singleton solution is also minimal, S'=<{a},{b,c}>+<{b},{b,c}> is also a solution but it is larger, therefore less desirable.

Some sizes, in concrete instances, that we need to handle : k ~ 1000 elements in the domain D, n <= 10 relatively small (main source of complexity), |S| ranging to large values > 10^6.

A naïve approach consists in first plunging S into the domain of S' 2^D^n, then using the following test, two by two, two tuples s1,s2 in S' can be fused to form a single tuple in S' iff. they differ by only one component.

e.g.
< a,b >+< a,c > -> <{a},{b,c}> (differ on second component)

< b,b >+< b,c > -> <{b},{b,c}> (differ on second component)

<{a},{b,c}> + <{b},{b,c}> -> <{a,b},{b,c}> (differ on first component)

Now there could be several minimal S', we are interested in finding any one, and approximations of minimisation of some kind are also ok, provided they don't give wrong results (i.e. even if S' is not as small as it could be, but we get very fast results).

Naive algorithm has to deal with the fact that any newly introduced "fused" tuple could match with some other tuple so it scales really badly on large input sets, even with n remaining low. You need |S'|^2 comparisons to ensure convergence, and any time you do fuse two elements, I'm currently retesting every pair (how can I improve that ?).

A lot of efficiency is iteration order dependent, so sorting the set in some way(s) could be an option, or perhaps indexing using hashes, but I'm not sure how to do it.

Imperative pseudo code would be ideal, or pointers to a reformulation of the problem to something I can run a solver on would really help.

like image 533
Yann TM Avatar asked Jul 08 '15 00:07

Yann TM


People also ask

How do you find equivalent classes?

If X is the set of all integers, we can define the equivalence relation ~ by saying 'a ~ b if and only if ( a – b ) is divisible by 9'. Then the equivalence class of 4 would include -32, -23, -14, -5, 4, 13, 22, and 31 (and a whole lot more).

How do you show that two equivalence classes are equal?

For each a,b∈A, a∼b if and only if [a]=[b]. Two elements of A are equivalent if and only if their equivalence classes are equal. Any two equivalence classes are either equal or they are disjoint. This means that if two equivalence classes are not disjoint then they must be equal.

How many equivalence classes are there?

(b) There are two equivalence classes: [0]= the set of even integers , and [1]= the set of odd integers .


1 Answers

Here's some psuedo (C# code that I haven't tested) that demonstrates your S'=<{a},{b,c}>+<{b},{b,c}> method. Except for the space requirements, which when using an integer index for the element are negligible; the overall efficiency and speed for Add'ing and Test'ing tuples should be extremely fast. If you want a practical solution then you already have one you just have to use the correct ADTs.

ElementType[] domain = new ElementType[]; // a simple array of domain elements
  FillDomain(domain); // insert all domain elements
  SortArray(domain); // sort the domain elements  K log K time
SortedDictionary<int, HashSet<int>> subsets; // int's are index/ref into domain
subsets = new SortedDictionary<int, HashSet<int>>();
//
void AddTuple(SortedDictionary<int, HashSet<int>> tuples, ElementType[] domain, ElementType first, elementType second) {
    int a = BinarySearch(domain, first); // log K time (binary search)
    int b = BinarySearch(domain, second); // log K time (binary search)
    if(tuples.ContainsKey(a)) { // log N time (binary search on sorted keys)
        if(!tuples[a].Contains(b)) { // constant time (hash lookup)
            tuples[a].Add(b); // constant time (hash add)
        }         
    } else { // constant time (instance + hash add)
        tuples[a] = new HashSet<in>();
        tuples[a].Add(b);
    }
}
//
bool ContainsTuple(SortedDictionary<int, HashSet<int>> tuples, ElementType[] domain, ElementType first, ElementType second) {
    int a = BinarySearch(domain, first); // log K time (binary search)
    int b = BinarySearch(domain, second); // log K time (binary search)
    if(tuples.ContainsKey(a)) { // log N time (binary search on sorted keys)
        if(tuples[a].Contains(b)) { // constant time (hash test)
            return true;
        }
    }
    return false;
}

The space savings for optimizing your tuple subset S' won't outweight the slowdown of the optimization process itself. For size optimization (if you know you're K will be less than 65536 you could use short integers instead of integers in the SortedDictionary and HashSet. But even 50 mil integers only take up 4 bytes per 32bit integer * 50 mil ~= 200 MB.

EDIT Here's another approach by encoding/mapping your tuples to a string you can take advantage of binary string compare and the fact that UTF-16 / UTF-8 encoding is very size efficient. Again this still doesn't doing the merging optimization you want, but speed and efficiency would be pretty good.

Here's some quick pseudo code in JavaScript.

Array.prototype.binarySearch = function(elm) {
  var l = 0, h = this.length - 1, i; 
  while(l <= h) { 
    i = (l + h) >> 1; 
    if(this[i] < elm) l = ++i; 
    else if(this[i] > elm) h = --i; 
    else return i; 
  } 
  return -(++l); 
};
// map your ordered domain elements to characters 
// For example JavaScript's UTF-16 should be fine
// UTF-8 would work as well
var domain = {
  "a": String.fromCharCode(1),
  "b": String.fromCharCode(2),
  "c": String.fromCharCode(3),
  "d": String.fromCharCode(4)
}
var tupleStrings = [];
// map your tuple to the string encoding
function map(tuple) {
  var str = "";
  for(var i=0; i<tuple.length; i++) {
    str += domain[tuple[i]];
  }
  return str;
}
function add(tuple) {
  var str = map(tuple);
  // binary search
  var index = tupleStrings.binarySearch(str);
  if(index < 0) index = ~index;
  // insert depends on tupleString's type implementation
  tupleStrings.splice(index, 0, str);
}
function contains(tuple) {
  var str = map(tuple);
  // binary search 
  return tupleString.binarySearch(str) >= 0;
}

add(["a","b"]);
add(["a","c"]);
add(["b","b"]);
add(["b","c"]);
add(["c","c"]);
add(["d","a"]);
alert(contains(["a","a"]));
alert(contains(["d","a"]));
alert(JSON.stringify(tupleStrings, null, "\n"));
like image 134
Louis Ricci Avatar answered Oct 25 '22 02:10

Louis Ricci