Fuse tuples to find equivalence classes

Q: How do you show that two equivalence classes are equal?

For each a,b∈A, a&sim;b if and only if [a]=[b]. Two elements of A are equivalent if and only if their equivalence classes are equal. Any two equivalence classes are either equal or they are disjoint. This means that if two equivalence classes are not disjoint then they must be equal.

Tags:

algorithm

Suppose we have a finite domain D={d1,..dk} containg k elements.

We consider S a subset of D^n, i.e. a set of tuples of the form < a1,..,an >, with ai in D.

We want to represent it (compactly) using S' a subset of 2^D^n, i.e. a set of tuples of the form < A1,..An > with Ai being subsets of D. The implication is that for any tuple s' in S' all elements in the cross product of Ai exist in S.

For instance, consider D={a,b,c} so k=3, n=2 and the tuples S=< a,b >+< a,c >+< b,b >+< b,c >.

We can use S'=<{a,b},{b,c}> to represent S.

This singleton solution is also minimal, S'=<{a},{b,c}>+<{b},{b,c}> is also a solution but it is larger, therefore less desirable.

Some sizes, in concrete instances, that we need to handle : k ~ 1000 elements in the domain D, n <= 10 relatively small (main source of complexity), |S| ranging to large values > 10^6.

A naïve approach consists in first plunging S into the domain of S' 2^D^n, then using the following test, two by two, two tuples s1,s2 in S' can be fused to form a single tuple in S' iff. they differ by only one component.

e.g.
< a,b >+< a,c > -> <{a},{b,c}> (differ on second component)

< b,b >+< b,c > -> <{b},{b,c}> (differ on second component)

<{a},{b,c}> + <{b},{b,c}> -> <{a,b},{b,c}> (differ on first component)

Now there could be several minimal S', we are interested in finding any one, and approximations of minimisation of some kind are also ok, provided they don't give wrong results (i.e. even if S' is not as small as it could be, but we get very fast results).

Naive algorithm has to deal with the fact that any newly introduced "fused" tuple could match with some other tuple so it scales really badly on large input sets, even with n remaining low. You need |S'|^2 comparisons to ensure convergence, and any time you do fuse two elements, I'm currently retesting every pair (how can I improve that ?).

A lot of efficiency is iteration order dependent, so sorting the set in some way(s) could be an option, or perhaps indexing using hashes, but I'm not sure how to do it.

Imperative pseudo code would be ideal, or pointers to a reformulation of the problem to something I can run a solver on would really help.

533

asked Jul 08 '15 00:07

Yann TM

1 Answers

Here's some psuedo (C# code that I haven't tested) that demonstrates your S'=<{a},{b,c}>+<{b},{b,c}> method. Except for the space requirements, which when using an integer index for the element are negligible; the overall efficiency and speed for Add'ing and Test'ing tuples should be extremely fast. If you want a practical solution then you already have one you just have to use the correct ADTs.

ElementType[] domain = new ElementType[]; // a simple array of domain elements
  FillDomain(domain); // insert all domain elements
  SortArray(domain); // sort the domain elements  K log K time
SortedDictionary<int, HashSet<int>> subsets; // int's are index/ref into domain
subsets = new SortedDictionary<int, HashSet<int>>();
//
void AddTuple(SortedDictionary<int, HashSet<int>> tuples, ElementType[] domain, ElementType first, elementType second) {
    int a = BinarySearch(domain, first); // log K time (binary search)
    int b = BinarySearch(domain, second); // log K time (binary search)
    if(tuples.ContainsKey(a)) { // log N time (binary search on sorted keys)
        if(!tuples[a].Contains(b)) { // constant time (hash lookup)
            tuples[a].Add(b); // constant time (hash add)
        }         
    } else { // constant time (instance + hash add)
        tuples[a] = new HashSet<in>();
        tuples[a].Add(b);
    }
}
//
bool ContainsTuple(SortedDictionary<int, HashSet<int>> tuples, ElementType[] domain, ElementType first, ElementType second) {
    int a = BinarySearch(domain, first); // log K time (binary search)
    int b = BinarySearch(domain, second); // log K time (binary search)
    if(tuples.ContainsKey(a)) { // log N time (binary search on sorted keys)
        if(tuples[a].Contains(b)) { // constant time (hash test)
            return true;
        }
    }
    return false;
}

The space savings for optimizing your tuple subset S' won't outweight the slowdown of the optimization process itself. For size optimization (if you know you're K will be less than 65536 you could use short integers instead of integers in the SortedDictionary and HashSet. But even 50 mil integers only take up 4 bytes per 32bit integer * 50 mil ~= 200 MB.

EDIT Here's another approach by encoding/mapping your tuples to a string you can take advantage of binary string compare and the fact that UTF-16 / UTF-8 encoding is very size efficient. Again this still doesn't doing the merging optimization you want, but speed and efficiency would be pretty good.

Here's some quick pseudo code in JavaScript.

Array.prototype.binarySearch = function(elm) {
  var l = 0, h = this.length - 1, i; 
  while(l <= h) { 
    i = (l + h) >> 1; 
    if(this[i] < elm) l = ++i; 
    else if(this[i] > elm) h = --i; 
    else return i; 
  } 
  return -(++l); 
};
// map your ordered domain elements to characters 
// For example JavaScript's UTF-16 should be fine
// UTF-8 would work as well
var domain = {
  "a": String.fromCharCode(1),
  "b": String.fromCharCode(2),
  "c": String.fromCharCode(3),
  "d": String.fromCharCode(4)
}
var tupleStrings = [];
// map your tuple to the string encoding
function map(tuple) {
  var str = "";
  for(var i=0; i<tuple.length; i++) {
    str += domain[tuple[i]];
  }
  return str;
}
function add(tuple) {
  var str = map(tuple);
  // binary search
  var index = tupleStrings.binarySearch(str);
  if(index < 0) index = ~index;
  // insert depends on tupleString's type implementation
  tupleStrings.splice(index, 0, str);
}
function contains(tuple) {
  var str = map(tuple);
  // binary search 
  return tupleString.binarySearch(str) >= 0;
}

add(["a","b"]);
add(["a","c"]);
add(["b","b"]);
add(["b","c"]);
add(["c","c"]);
add(["d","a"]);
alert(contains(["a","a"]));
alert(contains(["d","a"]));
alert(JSON.stringify(tupleStrings, null, "\n"));

134

answered Oct 25 '22 02:10

Louis Ricci

Related questions
                            
                                dijkstra's algorithm - in c++?
                            
                                determine if a string has all unique characters?
                            
                                How do I iterate over Binary Tree?
                            
                                bit vector implementation of set in Programming Pearls, 2nd Edition
                            
                                Number of substrings in a string: n-squared or exponential
                            
                                Algorithm - How to delete duplicate elements in a list efficiently?
                            
                                Find four consecutive numbers that sum to given number
                            
                                What is the right way to solve Codility's PermMissingElem test? (Java)
                            
                                Determining if two line segments intersect? [duplicate]
                            
                                Factorial in C without conditionals, loops and arithmetic operators
                            
                                Why isn't speech recognition advancing? [closed]
                            
                                Calculate shortest path through a grocery store
                            
                                Variation of the backpack ... in python
                            
                                How to do algorithm visualization?
                            
                                Clojure DAG (Bayesian Network)
                            
                                Minimizing chunks in a matrix
                            
                                Algorithm for finding minimal cycles in a graph
                            
                                Speeding up the L1 distance between all pairs in a ground set

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With