Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algorithm to group items in groups of 3

I am trying to solve a problem where I have pairs like:

A C
B F
A D
D C
F E
E B
A B
B C
E D
F D

and I need to group them in groups of 3 where I must have a triangule of matching from that list. Basically I need a result if its possible or not to group a collection.

So the possible groups are (ACD and BFE), or (ABC and DEF) and this collection is groupable since all letters can be grouped in groups of 3 and no one is left out.

I made a script where I can achieve this for small ammounts of input but for big ammounts it gets too slow.

My logic is:

make nested loop to find first match (looping untill I find a match)
 > remove 3 elements from the collection
  > run again

and I do this until I am out of letters. Since there can be different combinations I run this multiple times starting on different letters until I find a match.

I can understand that this gives me loops in order at least N^N and can get too slow. Is there a better logic for such problems? can a binary tree be used here?

like image 530
Rikard Avatar asked Oct 22 '16 15:10

Rikard


3 Answers

This problem can be modeled as a graph Clique cover problem. Every letter is a node and every pair is an edge and you want to partition the graph into vertex-disjoint cliques of size 3 (triangles). If you want the partitioning to be of minimum cardinality then you want a minimum clique cover.
Actually this would be a k-clique cover problem, because in the clique cover problem you can have cliques of arbitrary/different sizes.

like image 113
Alberto Rivelli Avatar answered Nov 11 '22 13:11

Alberto Rivelli


As Alberto Rivelli already stated, this problem is reducible to the Clique Cover problem, which is NP-hard. It is also reducible to the problem of finding a clique of particular/maximum size. Maybe there are others, not NP-hard problems to which your particular case could be reduced to, but I didn't think of any.

However, there do exist algorithms which can find the solution in polynomial time, although not always for worst cases. One of them is Bron–Kerbosch algorithm, which is known by far to be the most efficient algorithm for finding the maximum clique and can find a clique in the worst case of O(3^(n/3)). I don't know the size of your inputs, but I hope it will be sufficient for your problem.

Here is the code in Python, ready to go:

#!/usr/bin/python3
# @by DeFazer
# Solution to:
#  stackoverflow.com/questions/40193648/algorithm-to-group-items-in-groups-of-3
# Input:
#  N P - number of vertices and number of pairs
#  P pairs, 1 pair per line
# Output:
#  "YES" and groups themselves if grouping is possible, and "NO" otherwise
# Input example:
#  6 10
#  1 3
#  2 6
#  1 4
#  4 3
#  6 5
#  5 2
#  1 2
#  2 3
#  5 4
#  6 4
# Output example:
#  YES
#  1-2-3
#  4-5-6
# Output commentary:
#  There are 2 possible coverages: 1-2-3*4-5-6 and 2-5-6*1-3-4.
# If required, it can be easily modified to return all possible groupings rather than just one.

# Algorithm:
#  1) List *all* existing triangles (1-2-3, 1-3-4, 2-5-6...)
#  2) Build a graph where vertices represent triangles and edges connect these triangles with no common... vertices. Sorry for ambiguity. :)
#  3) Use [this](en.wikipedia.org/wiki/Bron–Kerbosch_algorithm) algorithm (slightly modified) to find a clique of size N/3.
#       The grouping is possible if such clique exists.

N, P = map(int, input().split())
assert (N%3 == 0) and (N>0)
cliquelength = N//3

pairs = {} # {a:{b, d, c}, b:{a, c, f}, c:{a, b}...}

# Get input

# [(0, 1), (1, 3), (3, 2)...]
##pairlist = list(map(lambda ab: tuple(map(lambda a: int(a)-1, ab)), (input().split() for pair in range(P))))
pairlist=[]
for pair in range(P):
    a, b = map(int, input().split())
    if a>b:
        b, a = a, b
    a, b = a-1, b-1
    pairlist.append((a, b))
pairlist.sort()

for pair in pairlist:
    a, b = pair

    if a not in pairs:
        pairs[a] = set()
    pairs[a].add(b)

# Make list of triangles
triangles = []
for a in range(N-2):
    for b in pairs.get(a, []):
        for c in pairs.get(b, []):
            if c in pairs[a]:
                triangles.append((a, b, c))
                break

def no_mutual_elements(sortedtupleA, sortedtupleB):
    # Utility function
    # TODO: if too slow, can be improved to O(n) since tuples are sorted. However, there are only 9 comparsions in case of triangles.
    return all((a not in sortedtupleB) for a in sortedtupleA)

# Make a graph out of that list
tgraph = [] # if a<b and (b in tgraph[a]), then triangles[a] has no common elements with triangles[b]
T = len(triangles)
for t1 in range(T):
    s = set()
    for t2 in range(t1+1, T):
        if no_mutual_elements(triangles[t1], triangles[t2]):
            s.add(t2)
    tgraph.append(s)

def connected(a, b):
    if a > b:
        b, a = a, b
    return (b in tgraph[a])

# Finally, the magic algorithm!

CSUB = set()
def extend(CAND:set, NOT:set) -> bool:
    # while CAND is not empty and there is no vertex in NOT connected to *all* vertexes in CAND
    while CAND and all((any(not connected(n, c) for c in CAND)) for n in NOT):
        v = CAND.pop()
        CSUB.add(v)
        newCAND = {c for c in CAND if connected(c, v)}
        newNOT = {n for n in NOT if connected(n, v)}
        if (not newCAND) and (not newNOT) and (len(CSUB)==cliquelength): # the last condition is the algorithm modification
            return True
        elif extend(newCAND, newNOT):
            return True
        else:
            CSUB.remove(v)
            NOT.add(v)

if extend(set(range(T)), set()):
    print("YES")
    # If the clique itself is not needed, it's enough to remove the following 2 lines
    for a, b, c in [triangles[c] for c in CSUB]:
        print("{}-{}-{}".format(a+1, b+1, c+1))
else:
    print("NO")

If this solution is still too slow, perphaps it may be more efficient to solve the Clique Cover problem instead. If that's the case, I can try to find a proper algorithm for it.

Hope that helps!

like image 37
DeFazer Avatar answered Nov 11 '22 13:11

DeFazer


Well i have implemented the job in JS where I feel most confident. I also tried with 100000 edges which are randomly selected from 26 letters. Provided that they are all unique and not a point such as ["A",A"] it resolves in like 90~500 msecs. The most convoluted part was to obtain the nonidentical groups, those without just the order of the triangles changing. For the given edges data it resolves within 1 msecs.

As a summary the first reduce stage finds the triangles and the second reduce stage groups the disconnected ones.

function getDisconnectedTriangles(edges){
 return edges.reduce(function(p,e,i,a){
                       var ce = a.slice(i+1)
                                 .filter(f => f.some(n => e.includes(n))), // connected edges
                           re = [];                                        // resulting edges
                       if (ce.length > 1){
                         re = ce.reduce(function(r,v,j,b){
                                          var xv = v.find(n => e.indexOf(n) === -1),  // find the external vertex
                                              xe = b.slice(j+1)                       // find the external edges
                                                    .filter(f => f.indexOf(xv) !== -1 );
                                          return xe.length ? (r.push([...new Set(e.concat(v,xe[0]))]),r) : r;
                                        },[]);
                       }
                       return re.length ? p.concat(re) : p;
                     },[])
             .reduce((s,t,i,a) => t.used ? s
                                         : (s.push(a.map((_,j) => a[(i+j)%a.length])
                                                    .reduce((p,c,k) => k-1 ? p.every(t => t.every(n => c.every(v => n !== v))) ? (c.used = true, p.push(c),p) : p
                                                                           : [p].every(t => t.every(n => c.every(v => n !== v))) ? (c.used = true, [p,c]) : [p])),s)
                                  ,[]);
}

var edges = [["A","C"],["B","F"],["A","D"],["D","C"],["F","E"],["E","B"],["A","B"],["B","C"],["E","D"],["F","D"]],
       ps = 0,
       pe = 0,
   result = [];

ps = performance.now();
result = getDisconnectedTriangles(edges);
pe = performance.now();
console.log("Disconnected triangles are calculated in",pe-ps, "msecs and the result is:");
console.log(result);

You may generate random edges in different lengths and play with the code here

like image 1
Redu Avatar answered Nov 11 '22 11:11

Redu