I have a collection of objects with properties. I want to find the simplest set of criteria that will specify exactly one of these objects (I do not care which one).
For example, given {a=1, b=1, c=1}, {a=1, b=2, c=1}, {a=1, b=1, c=2}, specifying b==2 (or c==2) will give me an unique object.
Likewise, given {a=1, b=1, c=1}, {a=1, b=2, c=2}, {a=1, b=2, c=1}, specifying b==2 and c==2 (or b==1 && c==1 or b==2 && c==1) will give me an unique object.
This sounds like a known problem, with a known solution, but I haven't been able to find the correct formulation of the problem to allow me to Google it.
It is indeed a known problem in AI - feature selection. There are many algorithms for doing this Just Google "feature selection" "artificial intelligence".
The main issue is that when the samples set is large, you need to use some sort of heuristics in order to reach a solution within a reasonable time.
Feature Selection in Data Mining
The main idea of feature selection is to choose a subset of input variables by eliminating features with little or no predictive information.
The freedom of choosing the target is sort of unusual. If the target is specified, then this is essentially the set cover problem. Here's two corresponding instances side by side.
A={1,2,3} B={2,4} C={3,4} D={4,5}
0: {a=0, b=0, c=0, d=0} # separate 0 from the others
1: {a=1, b=0, c=0, d=0}
2: {a=1, b=1, c=0, d=0}
3: {a=1, b=0, c=1, d=0}
4: {a=0, b=1, c=1, d=1}
5: {a=0, b=0, c=0, d=1}
While set cover is NP-hard, however, your problem has an O(mlog n + O(1) poly(n)) algorithm where m is the number of attributes and n is the number of items (the optimal set of criteria has size at most log n), which makes it rather unlikely that an NP-hardness proof is forthcoming. I'm reminded of the situation with the Junta problem (basically the theoretical formulation of feature selection).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With