I'd like to ask everyone a question about how correlated features (variables) affect the classification accuracy of machine learning algorithms. With correlated features I mean a correlation between them and not with the target class (i.e the perimeter and the area of a geometric figure or the level of education and the average income). In my opinion correlated features negatively affect eh accuracy of a classification algorithm, I'd say because the correlation makes one of them useless. Is it truly like this? Does the problem change with the respect of the classification algorithm type? Any suggestion on papers and lectures are really welcome! Thanks
Correlated features do not affect classification accuracy per se.
When we have highly correlated features in the dataset, the values in “S” matrix will be small. So inverse square of “S” matrix (S^-2 in the above equation) will be large which makes the variance of Wₗₛ large. So, it is advised that we keep only one feature in the dataset if two features are highly correlated.
Positive Correlation: means that if feature A increases then feature B also increases or if feature A decreases then feature B also decreases. Both features move in tandem and they have a linear relationship. Negative Correlation: means that if feature A increases then feature B decreases and vice versa.
The stronger the correlation, the more difficult it is to change one variable without changing another. It becomes difficult for the model to estimate the relationship between each independent variable and the dependent variable independently because the independent variables tend to change in unison.
Correlated features do not affect classification accuracy per se. The problem in realistic situations is that we have a finite number of training examples with which to train a classifier. For a fixed number of training examples, increasing the number of features typically increases classification accuracy to a point but as the number of features continue to increase, classification accuracy will eventually decrease because we are then undersampled relative to the large number of features. To learn more about the implications of this, look at the curse of dimensionality.
If two numerical features are perfectly correlated, then one doesn't add any additional information (it is determined by the other). So if the number of features is too high (relative to the training sample size), then it is beneficial to reduce the number of features through a feature extraction technique (e.g., via principal components)
The effect of correlation does depend on the type of classifier. Some nonparametric classifiers are less sensitive to correlation of variables (although training time will likely increase with an increase in the number of features). For statistical methods such as Gaussian maximum likelihood, having too many correlated features relative to the training sample size will render the classifier unusable in the original feature space (the covariance matrix of the sample data becomes singular).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With