Suppose you have a list of subsets S1,...,Sn
of the integer range R={1,2,...,N}
, and an integer k
. Is there an efficient way to find a subset C
of R
of size k
such that C
is a subset of a maximal number of the Si
?
As an example, let R={1,2,3,4}
and k=2
S1={1,2,3}
S2={1,2,3}
S3={1,2,4}
S4={1,3,4}
Then I want to return either C={1,2}
or C={1,3}
(doesn't matter which).
I think your problem is NP-Hard. Consider the bipartite graph with the left nodes being your sets and the right nodes being the integers {1, ..., N}
, with an edge between two nodes if the set contains the integer. Then, finding a common subset of size k
, which is a subset of a maximal number of the Si
, is equivalent to finding a complete bipartite subgraph K(i, k)
with maximal number of edges i*k
. If you could do this in polynomial time, then, you could find the complete bipartite subgraph K(i, j)
with maximal number of edges i*j
in polynomial time, by trying for each fixed k
. But this problem in NP-Complete (Complete bipartite graph).
So, unless P=NP, your problem does not have a polynomial time algorithm.
Assuming I understand your question I believe this is straightforward for fairly small sets.
I will use Mathematica code for illustration, but the concept is universal.
I generate 10
random subsets of length 4
, from the set {1 .. 8}:
ss = Subsets[Range@8, {4}] ~RandomSample~ 10
{{1, 3, 4, 6}, {2, 6, 7, 8}, {3, 5, 6, 7}, {2, 4, 6, 7}, {1, 4, 5, 8}, {2, 4, 6, 8}, {1, 2, 3, 8}, {1, 6, 7, 8}, {1, 2, 4, 7}, {1, 2, 5, 7}}
I convert these to a binary array of the presence of each number in each subset:
a = Normal@SparseArray[Join @@ MapIndexed[Tuples[{##}] &, ss] -> 1];
Grid[a]
That is ten columns for ten subsets, and eight rows for elements {1 .. 8}.
Now generate all possible target subsets (size 3
):
keys = Subsets[Union @@ ss, {3}];
Take a "key" and extract those rows from the array and do a BitAnd operation (return 1
iff all columns equal 1
), then count the number of ones. For example, for key {1, 6, 8}
we have:
a[[{1, 6, 8}]]
After BitAnd:
Do this for each key:
counts = Tr[BitAnd @@ a[[#]]] & /@ keys;
Then find the position(s) of the maximum element of that list, and extract the corresponding parts of keys
:
keys ~Extract~ Position[counts, Max@counts]
{{1, 2, 7}, {2, 4, 6}, {2, 4, 7}, {2, 6, 7}, {2, 6, 8}, {6, 7, 8}}
With adequate memory this process works quickly for a larger set. Starting with 50,000 randomly selected subsets of length 7
from {1 .. 30}:
ss = Subsets[Range@30, {7}] ~RandomSample~ 50000;
The maximum sub-subsets of length 4
are calculated in about nine seconds:
AbsoluteTiming[
a = Normal@SparseArray[Join @@ MapIndexed[Tuples[{##}] &, ss] -> 1];
keys = Subsets[Union @@ ss, {4}];
counts = Tr[BitAnd @@ a[[#]]] & /@ keys;
keys~Extract~Position[counts, Max@counts]
]
{8.8205045, {{2, 3, 4, 20}, {7, 10, 15, 18}, {7, 13, 16, 26}, {11, 21, 26, 28}}}
I should add that Mathematica is a high level language and these operations are on generic objects, therefore if this is done truly at the binary level this should be much faster, and more memory efficient.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With