Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between PCA (Principal Component Analysis) and Feature Selection

Tags:

What is the difference between Principal Component Analysis (PCA) and Feature Selection in Machine Learning? Is PCA a means of feature selection?

like image 711
AbhinavChoudhury Avatar asked Apr 27 '13 07:04

AbhinavChoudhury


People also ask

Does PCA do feature selection?

A feature selection method is proposed to select a subset of variables in principal component analysis (PCA) that preserves as much information present in the complete data as possible. The information is measured by means of the percentage of consensus in generalised Procrustes analysis.

What is principal component analysis can we use PCA for feature selection?

Perhaps the most popular technique for dimensionality reduction in machine learning is Principal Component Analysis, or PCA for short. This is a technique that comes from the field of linear algebra and can be used as a data preparation technique to create a projection of a dataset prior to fitting a model.

Should I do feature selection before PCA?

Anyways, the correct answer should be: it depends. Typically a Feature Selection step comes after the PCA (with a optimization parameter describing the number of features and Scaling comes before PCA. However, depending on the problem this my change. You might want to apply PCA only on a subset of features.

What is the difference between feature selection and dimensionality reduction?

Feature Selection vs Dimensionality ReductionFeature selection is simply selecting and excluding given features without changing them. Dimensionality reduction transforms features into a lower dimension.


1 Answers

PCA is a way of finding out which features are important for best describing the variance in a data set. It's most often used for reducing the dimensionality of a large data set so that it becomes more practical to apply machine learning where the original data are inherently high dimensional (e.g. image recognition).

PCA has limitations though, because it relies on linear relationships between feature elements and it's often unclear what the relationships are before you start. As it also "hides" feature elements that contribute little to the variance in the data, it can sometimes eradicate a small but significant differentiator that would affect the performance of a machine learning model.

like image 86
Roger Rowland Avatar answered Sep 19 '22 23:09

Roger Rowland