Does it help in classifying better if I add linear, non-linear combinatinos of the existing features ? For example does it help to add mean, variance as new features computed from the existing features ? I believe that it definitely depends on the classification algorithm as in the case of PCA, the algorithm by itself generates new features which are orthogonal to each other and are linear combinations of the input features. But how does it effect in the case of decision tree based classifiers or others ?
Yes, combination of existing features can give new features and help for classification. Moreover, combination of the feature with itself (e.g. polynomial from the feature) can be used as this additional data to be used during classification.
As an example, consider logistic regression classifier with such linear formula as its core:
g(x, y) = 1*x + 2*y
Imagine, that you have 2 observations:
In both cases g()
will be equal to 8. If observations belong to different classes, you have no possibility to distinguish them. But let's add one more variable (feature) z
, which is combination of the previous 2 features - z = x * y
:
g(x, y, z) = 1*x + 2*y + 0.5*z
Now for same observations we have:
So now we get 2 different points and can distinguish between 2 observations.
Polynomial features (x^2, x^3, y^2, etc.) do not give additional points, but instead change the graph of the function. For example, g(x) = a0 + a1*x
is a line, while g(x) = a0 + a1*x + a2*x^2
is parabola and thus can fit data much more closely.
In general, it's always better to have more features. Unless you have very predictive features (i.e. they allow for perfect separation of the classes to predict) already, I would always recommend adding more features. In practice, many classification algorithms (and in particular decision tree inducers) select the best features for their purposes anyway.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With