I am slightly confused as to what "feature selection / extractor / weights" mean and the difference between them. As I read the literature sometimes I feel lost as I find the term used quite loosely, my primary concerns are --
When people talk of Feature Frequency, Feature Presence - is it feature selection?
When people talk of algorithms such as Information Gain, Maximum Entropy - is it still feature selection.
If I train the classifier - with a feature set that asks the classifier to note the position of a word within a document as an example - would one still call this feature selection?
Thanks Rahul Dighe
Rahul-
All of these are good answers. The one thing I would mention is that the fundamental difference between selection and extraction has to do with how you are treating the data.
Feature Extraction methods are transformative -- that is you are applying a transformation to your data to project it into a new feature space with lower dimension. PCA, and SVD are examples of this.
Feature Selection methods choose features from the original set based on some criteria, Information Gain, Correlation and Mutual Information are just criteria that are used to filter out unimportant or redundant features. Embedded or wrapper methods, as they are called, can use specialized classifiers to achieve feature selection and classify the dataset at the same time.
A really nice overview of the problem space is given here.
Good Luck!
Feature extraction: reduce dimensionality by (linear or non- linear) projection of D-dimensional vector onto d-dimensional vector (d < D). Example: principal component analysis
Feature selection: reduce dimensionality by selecting subset of original variables. Example: forward or backward feature selection
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With