Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding smallest set of criteria for uniqueness

I have a collection of objects with properties. I want to find the simplest set of criteria that will specify exactly one of these objects (I do not care which one).

For example, given {a=1, b=1, c=1}, {a=1, b=2, c=1}, {a=1, b=1, c=2}, specifying b==2 (or c==2) will give me an unique object.

Likewise, given {a=1, b=1, c=1}, {a=1, b=2, c=2}, {a=1, b=2, c=1}, specifying b==2 and c==2 (or b==1 && c==1 or b==2 && c==1) will give me an unique object.

This sounds like a known problem, with a known solution, but I haven't been able to find the correct formulation of the problem to allow me to Google it.

like image 346
Rasmus Faber Avatar asked Dec 05 '11 10:12

Rasmus Faber


2 Answers

It is indeed a known problem in AI - feature selection. There are many algorithms for doing this Just Google "feature selection" "artificial intelligence".

The main issue is that when the samples set is large, you need to use some sort of heuristics in order to reach a solution within a reasonable time.


Feature Selection in Data Mining

The main idea of feature selection is to choose a subset of input variables by eliminating features with little or no predictive information.

like image 110
OSH Avatar answered Sep 28 '22 10:09

OSH


The freedom of choosing the target is sort of unusual. If the target is specified, then this is essentially the set cover problem. Here's two corresponding instances side by side.

A={1,2,3} B={2,4} C={3,4} D={4,5}

0: {a=0, b=0, c=0, d=0}  # separate 0 from the others
1: {a=1, b=0, c=0, d=0}
2: {a=1, b=1, c=0, d=0}
3: {a=1, b=0, c=1, d=0}
4: {a=0, b=1, c=1, d=1}
5: {a=0, b=0, c=0, d=1}

While set cover is NP-hard, however, your problem has an O(mlog n + O(1) poly(n)) algorithm where m is the number of attributes and n is the number of items (the optimal set of criteria has size at most log n), which makes it rather unlikely that an NP-hardness proof is forthcoming. I'm reminded of the situation with the Junta problem (basically the theoretical formulation of feature selection).

like image 30
Per Avatar answered Sep 28 '22 11:09

Per