Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algorithm for best-effort classification of vector

Given four binary vectors which represent "classes":

[1,0,0,0,0,0,0,0,0,0]
[0,0,0,0,0,0,0,0,0,1]
[0,1,1,1,1,1,1,1,1,0]
[0,1,0,0,0,0,0,0,0,0]

What methods are available for classifying a vector of floating point values into one of these "classes"?

Basic rounding works in most cases:

round([0.8,0,0,0,0.3,0,0.1,0,0,0]) = [1 0 0 0 0 0 0 0 0 0] 

But how can I handle some interference?

round([0.8,0,0,0,0.6,0,0.1,0,0,0]) != [1 0 0 0 0 1 0 0 0 0]

This second case should be a better match for 1000000000, but instead, I have lost the solution entirely as there is no clear match.

I want to use MATLAB for this task.

like image 285
Blair Avatar asked Mar 01 '23 04:03

Blair


2 Answers

Find the SSD (sum of squared differences) of your test vector with each "class" and use the one with the least SSD.

Here's some code: I added a 0 to the end of the test vector you provided since it was only 9 digits whereas the classes had 10.

CLASSES = [1,0,0,0,0,0,0,0,0,0
           0,0,0,0,0,0,0,0,0,1
           0,1,1,1,1,1,1,1,1,0
           0,1,0,0,0,0,0,0,0,0];

TEST = [0.8,0,0,0,0.6,0,0.1,0,0,0];

% Find the difference between the TEST vector and each row in CLASSES
difference = bsxfun(@minus,CLASSES,TEST);
% Class differences
class_diff = sum(difference.^2,2);
% Store the row index of the vector with the minimum difference from TEST
[val CLASS_ID] = min(class_diff);
% Display
disp(CLASSES(CLASS_ID,:))

For illustrative purposes, difference looks like this:

 0.2    0   0   0   -0.6    0   -0.1    0   0   0
-0.8    0   0   0   -0.6    0   -0.1    0   0   1
-0.8    1   1   1    0.4    1    0.9    1   1   0
-0.8    1   0   0   -0.6    0   -0.1    0   0   0

And the distance of each class from TEST looks like this, class_diff:

 0.41
 2.01
 7.61
 2.01

And obviously, the first one is the best match since it has the least difference.

like image 117
Jacob Avatar answered Mar 07 '23 13:03

Jacob


This is the same thing as Jacob did, only with four different distance measures:

  • Euclidean distance
  • City-block distance
  • Cosine distance
  • Chebychev distance

%%
CLASSES = [1,0,0,0,0,0,0,0,0,0
           0,0,0,0,0,0,0,0,0,1
           0,1,1,1,1,1,1,1,1,0
           0,1,0,0,0,0,0,0,0,0];

TEST = [0.8,0,0,0,0.6,0,0.1,0,0,0];

%%
% sqrt( sum((x-y).^2) )
euclidean = sqrt( sum(bsxfun(@minus,CLASSES,TEST).^2, 2) );

% sum( |x-y| )
cityblock = sum(abs(bsxfun(@minus,CLASSES,TEST)), 2);

% 1 - dot(x,y)/(sqrt(dot(x,x))*sqrt(dot(y,y)))
cosine = 1 - ( CLASSES*TEST' ./ (norm(TEST)*sqrt(sum(CLASSES.^2,2))) );

% max( |x-y| )
chebychev = max( abs(bsxfun(@minus,CLASSES,TEST)), [], 2 );

dist = [euclidean cityblock cosine chebychev];

%%
[minDist classIdx] = min(dist);

Pick the one you like :)

like image 43
Amro Avatar answered Mar 07 '23 11:03

Amro