Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Group tuples such that each item in a group shares no common elements with other members

So I've run into a real world problem that goes like this: we have a list of pairs that we need to sort into groups. We want to minimize the number of groups that we have total, with the constraint that any member of a group cannot share an element with any other member of the group.

Here is an example. Our list of tuples is (A, B), (B, C), (C, A), (D, E), (F, G). We can form three groups by doing [(A, B), (D, E), (F, G)], [(B, C)], [(C, A)].

Is it possible to optimally solve this problem in polynomial time? How bad is the greedy solution? It is possible this has been posed as a different problem, but I can't quite figure out how to reduce it to something else (graph coloring comes to mind).

like image 591
Alex Alifimoff Avatar asked Aug 15 '16 22:08

Alex Alifimoff


1 Answers

The problem as stated can be thought of as the edge-coloring problem: you have a graph where each tuple entry is a node and the edges are given by the tuples. Clustering the tuples into groups then corresponds to finding groups of edges that don't share endpoints (matchings), which can then be assigned the same color in an edge coloring. In other words, each edge coloring gives you a clustering and each clustering gives you an edge coloring. Unfortunately, it's NP-hard to find the best edge coloring, so your problem is in general NP-hard. There are approximation algorithms for this problem that produce constant-factor approximations, but unless P=NP there's no exact algorithm.

If you generalize this problem to allow any number of elements per tuple, then this problem gets a lot harder. The general version of this problem is indeed NP-hard, and really hard to approximate, by a reduction from graph coloring. I'll show an example of the reduction in a specific case, but it generalizes quite nicely.

Given a graph like this one:

 A -- B -- C
 |    |    |
 D -- E -- F

We'll create a set of tuples, one for each node, where each entry in the tuple is the set of edges adjacent to that node. For example, in the above graph, we'd form these tuples:

( AB, AD )
( AB, BC, BE )
( CB, CF )
( AD, DE )
( BE, DE, EF )
( CF, EF)

Now, imagine that two of these tuples don't overlap. This means that the two nodes corresponding to those tuples must not be adjacent, since if they were, the edge between them would be a common element in the tuples and therefore they couldn't be clustered. On the other hand, if two nodes aren't adjacent, then their tuples can be grouped together in the same cluster, since no edge in one tuple will be present in the other.

Given this setup, any coloring of the original graph gives a way of clustering the tuples (put all tuples for nodes given the same color into the same set; none of them are adjacent, so they don't conflict), and any way of clustering the tuples gives a coloring (color all nodes corresponding to each tuple in a cluster the same color). Therefore, finding the minimum number of clusters corresponds to finding the chromatic number of the original graph, which is NP-hard and not known to admit any approximation algorithms that get anywhere close to the true value.

like image 124
templatetypedef Avatar answered Oct 23 '22 03:10

templatetypedef