Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Do combination of existing features make new features ?

Does it help in classifying better if I add linear, non-linear combinatinos of the existing features ? For example does it help to add mean, variance as new features computed from the existing features ? I believe that it definitely depends on the classification algorithm as in the case of PCA, the algorithm by itself generates new features which are orthogonal to each other and are linear combinations of the input features. But how does it effect in the case of decision tree based classifiers or others ?

like image 773
Pranay Avatar asked Dec 04 '22 18:12

Pranay


2 Answers

Yes, combination of existing features can give new features and help for classification. Moreover, combination of the feature with itself (e.g. polynomial from the feature) can be used as this additional data to be used during classification.

As an example, consider logistic regression classifier with such linear formula as its core:

g(x, y) = 1*x + 2*y

Imagine, that you have 2 observations:

  1. x = 6; y = 1
  2. x = 3; y = 2.5

In both cases g() will be equal to 8. If observations belong to different classes, you have no possibility to distinguish them. But let's add one more variable (feature) z, which is combination of the previous 2 features - z = x * y:

g(x, y, z) = 1*x + 2*y + 0.5*z

Now for same observations we have:

  1. x = 6; y = 1; z = 6 * 1 = 6 ==> g() = 11
  2. x = 3; y = 2.5; z = 3 * 2.5 = 7.5 ==> g() = 11.75

So now we get 2 different points and can distinguish between 2 observations.

Polynomial features (x^2, x^3, y^2, etc.) do not give additional points, but instead change the graph of the function. For example, g(x) = a0 + a1*x is a line, while g(x) = a0 + a1*x + a2*x^2 is parabola and thus can fit data much more closely.

like image 184
ffriend Avatar answered Jan 08 '23 14:01

ffriend


In general, it's always better to have more features. Unless you have very predictive features (i.e. they allow for perfect separation of the classes to predict) already, I would always recommend adding more features. In practice, many classification algorithms (and in particular decision tree inducers) select the best features for their purposes anyway.

like image 39
Lars Kotthoff Avatar answered Jan 08 '23 14:01

Lars Kotthoff