Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using Bhattacharyya Distance for feature selection

I have a set of 240 features extracted using Image Processing. The objective is to classify test cases into 7 different classes after training. For each class there are about 60 observations(viz, I have around 60 feature vectors for each class with each vector having 240 components).

Many research papers and books make use of the Sequential Forward Search or Sequential Backward search for selection of the best features from a feature vector. The following picture gives a sequential forward search algorithm. Here is a snapshot of the SFS algorithm

Any such algorithm uses some criterion to discriminate between features. A common method is to use the Bhattacharyya Distance as a criterion. The Bhattacharyya Distance is a divergence type measure between distributions. On some research and study I found that given a matrix M1 for a class A consisting of all the 60 feature vectors of this class such that it has n=60 rows and m=240 columns (since there are a total of 240 features) and a similar matrix M2 for a class B I can find out the Bhattacharyya Distance between them and find their interdependence.

My question is how do I integrate the two. How do I include the Bhattacharyya Distance as a criterion for selecting the best features in the algorithm as described above.

like image 578
Sohaib Avatar asked Oct 26 '13 14:10

Sohaib


2 Answers

With help from Arthur B. I finally understood the concept. Here is my implementation of it. Although I used the Plus l Take away r algorithm (Sequential Forwards Backward Search) Ill post that as it is basically the same once the Backward Search is removed. The below implementation is in matlab but very simple to understand:

S=zeros(Size,1); %Initial the binary array feature list with all zeros implying no feature selected
k=0;
while k<n  %Begin SFS. n is the number of features that need to be extracted
t=k+l;     %l is the number of features to be added in each iteration
while k<t
    R=zeros(Size,1);  %Size is the total number of features
    for i=1:Size
        if S(i)==0    %If the feature has not been selected. S is a binary array which puts a one against each feature that is selected
            S_copy=S;
            S_copy(i)=1;
            R=OperateBhattacharrya(Matrices,S_copy,i,e,R);  %The result of each iteration is stored in R
        end
    end
    k=k+1;   %increment k
    [~,N]=max(R);  %take the index of the maximum element in R as the best feature to be selected
    S(N)=1;        % put the index of selected feature as 1
end
t=k-r;    %r is the number of features to be removed after selecting l features. l>r
while k>t  %start Sequential Backward Search 
    R=zeros(Size,1);
    for i=1:Size
        if S(i)==1
            S_copy=S;
            S_copy(i)=0;
            R=OperateBhattacharrya(Matrices,S_copy,i,1,R);
        end
    end
    k=k-1;
    [~,N]=max(R);
    S(N)=0;
end
fprintf('Iteration :%d--%d\n',k,t);
end

I hope this helps anyone who has a similar problem.

like image 146
Sohaib Avatar answered Nov 15 '22 08:11

Sohaib


That's the "evaluate the branch" part of the algorithm, except you'll first use this Bhattacharyya distance on one dimensional vectors, then two dimensional vectors, etc.

like image 40
Arthur B. Avatar answered Nov 15 '22 07:11

Arthur B.