Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

For classifying data into N classes, is there an alternative to using N yes-no classifiers?

TL;DR: is there any kind of classifier more sophisticated than a yes-no classifier?

I'll say up front that I don't have a specific project I'm working on, and this is more of a technique question I've been wondering about.

I've worked on a few machine learning applications for one reason or another. All of these projects were intended to classify data into one of N classes, and they all used N yes-no classifiers (if that's what they're called). Each of these classifiers gives a piece of data some score (0 to 1, or -1 to 1) which corresponds to the likelyhood that it's the class that the classifier was trained for. It's then up to the program to use those scores to determine the best classification somehow.

I've seen this on both nominal and continuous data, with different implementations of the final classification. For example, I once wrote a small document language identifier in which classifiers were trained on English, French, German, etc, and whichever classifier gave the highest score won. This makes sense to me.

Another project classified data on a continuous scale, mostly from 0 to 1.2, but with some data up to 6. We made 6 or so classifiers and assigned them to bins: 0-0.2, 0.2-0.4, ... and 1.0 and above. Once all the classifiers returned for some piece of data, we then fit a quadratic to the scores and took the peak as the result. This makes me uncomfortable, but I don't know why.

It seems like there should be a better way than just polling a set of yes-no classifiers and trying to decide based on some algorithm. To take a silly example, consider a system to decide whether a picture shows an onion or a mushroom. (This is literally the first thing I thought of.) I would argue that the more an object looks like an onion, the less it looks like a mushroom, and from an ontological perspective I want a classification method that reflects that. If I have two yes-no classifiers that don't take into account that onionity opposes mushroomness, what do I do about a picture that gets high scores from both? Is there some way to get a single, mushroom-or-onion classifier that somehow knows that there is no overlap between these two classes of vegetation? Or can I count on training the yes-no classifiers with real data to reflect this without any special intervention?

like image 800
andronikus Avatar asked Feb 23 '23 17:02

andronikus


1 Answers

There are two broad schools of classification:

1) Discriminative - Here we try to learn a decision boundary from the training examples. Then based on which part of space the test example lies in, as determined by the decision boundary, we assign it a class. The state-of-the-art algorithm is the SVM, but you need kernels if your data is can't be separated by a line (for e.g it is separable by a circle).

Modifications to SVM for Multi-class (many ways of doing this, here's one):

Let the jth (of k) training example xj be in class i (of N). Then its label yj = i.

a) Feature Vector: If xj = a training example belonging to class i (of N) then the Feature Vector corresponding to xj is phi(xj,yj) = [0 0 ... X .. 0]

  • Note: X is in the ith "position". phi has a total of D*N components, where each example has D features e.g. a picture of an onion has D = 640*480 greyscale integers

  • Note: For other classes p i.e y = p, phi(xj, y) has "X" in the feature vector in position p, all other zero.

b) Constraints: Minimize W^2 (as in Vanilla SVM) such that:

1) For all labels y except y1: W.phi(x1,y1) >= W.phi(x1, y) + 1

and 2) For all labels y except y2: W.phi(x2,y2) >= W.phi(x2, y) + 1

...

and k) For all labels y except yk: W.phi(xk, yk) >= W.phi(xk, y) + 1

  • Note: The intuition here is that W.phi(xj, yj) is more than all other W.phi(xj, y1), W.phi(xj, y2) etc.

2) Generative - Here we ASSUME (which may turn out to be nonsense) that each example was generated by a probability distribution for that class (like a gaussian for males faces and one for female faces which works well in practice) & we try to learn the parameters - mean, covariance - of each distribution by calculating the mean, covariance of the training examples corresponding to that class. Then for a test example we see which distribution gives the highest probability and classify accordingly.

Neither uses N yes-no classifiers.

The discriminative method works better in practice for classification, but can't model probabilistic answers. It also needs a large number of training examples for the optimization step (minimize W^2) to converge. There is a technique to combine the two, avoiding kernels, called Maximum Entropy Discrimination.

To answer your other question:

what do I do about a picture that gets high scores from both? Is there some way to get a single, mushroom-or-onion classifier that somehow knows that there is no overlap between these two classes of vegetation?

This is more of a problem with the input data, not with the learning algorithm itself which just works on a matrix of numbers. It could reflect noise/uncertainty in the domain (aka can humans tell mushrooms apart from onions perfectly??). This maybe fixed by a larger/better (training) dataset. Or maybe you picked a bad distribution to model, in the generative case.

Most people would pre-process the raw images, prior to classification, in a stage called Feature Selection. One feature selection technique could be to capture the silhouette of the vegetable since mushrooms and onions have different shapes, the rest of the image maybe "noise". In other domains like Natural language processing, you could drop prepositions, and retain a count of the different nouns. But sometimes performance may not improve because the learning algorithm might not look at all the features anyway. It really depends on what you're trying to capture - creativity is involved. Feature selection algorithms also exist.

A good resource for machine learning are Tony Jebara's courses at Columbia University

like image 138
satish b Avatar answered Apr 30 '23 22:04

satish b