Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sequential feature selection Matlab

Can somebody explain how to use this function in Matlab "sequentialfs"

it looks straight forward but I do not know how can we design a function handler for it?!

any clue?!

like image 468
Mohamad Ibrahim Avatar asked Nov 27 '11 18:11

Mohamad Ibrahim


People also ask

What is sequential feature selection?

Sequential Feature Selection Algorithms In a proper technique, the algorithm selects multiple features from the set of features and evaluates them for model iterate number between the different sets with reducing and improving the number of features so that the model can meet the optimal performance and results.

What is sequential floating forward selection?

The Sequential Forward Floating Selection (SFFS) applies a number of backward steps at each forward step and the evaluates the resulting subsets. The new subsets are compared to the previous ones and selected if better [8] .

What are feature selection methods?

What is Feature Selection? Feature Selection is the method of reducing the input variable to your model by using only relevant data and getting rid of noise in data. It is the process of automatically choosing relevant features for your machine learning model based on the type of problem you are trying to solve.


1 Answers

Here's a simpler example than the one in the documentation.

First let's create a very simple dataset. We have some class labels y. 500 are from class 0, and 500 are from class 1, and they are randomly ordered.

>> y = [zeros(500,1); ones(500,1)];
>> y = y(randperm(1000));

And we have 100 variables x that we want to use to predict y. 99 of them are just random noise, but one of them is highly correlated with the class label.

>> x = rand(1000,99);
>> x(:,100) = y + rand(1000,1)*0.1;

Now let's say we want to classify the points using linear discriminant analysis. If we were to do this directly without applying any feature selection, we would first split the data up into a training set and a test set:

>> xtrain = x(1:700, :); xtest = x(701:end, :);
>> ytrain = y(1:700); ytest = y(701:end);

Then we would classify them:

>> ypred = classify(xtest, xtrain, ytrain);

And finally we would measure the error rate of the prediction:

>> sum(ytest ~= ypred)
ans =
     0

and in this case we get perfect classification.

To make a function handle to be used with sequentialfs, just put these pieces together:

>> f = @(xtrain, ytrain, xtest, ytest) sum(ytest ~= classify(xtest, xtrain, ytrain));

And pass all of them together into sequentialfs:

>> fs = sequentialfs(f,x,y)
fs =
  Columns 1 through 16
     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
  Columns 17 through 32
     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
  Columns 33 through 48
     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
  Columns 49 through 64
     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
  Columns 65 through 80
     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
  Columns 81 through 96
     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
  Columns 97 through 100
     0     0     0     1

The final 1 in the output indicates that variable 100 is, as expected, the best predictor of y among the variables in x.

The example in the documentation for sequentialfs is a little more complex, mostly because the predicted class labels are strings rather than numerical values as above, so ~strcmp is used to calculate the error rate rather than ~=. In addition it makes use of cross-validation to estimate the error rate, rather than direct evaluation as above.

like image 51
Sam Roberts Avatar answered Nov 03 '22 22:11

Sam Roberts