Just getting started with this library... having some issues (i've read the docs but didn't get clarity) with RandomForestClassifiers
My question is pretty simple, say i have a train data set like
A B C
1 2 3
Where A is the independent variable (y) and B-C are the dependent variables (x). Let's say the test set looks the same, however the order is
B A C
1 2 3
When I call forest.fit(train_data[0:,1:],train_data[0:,0])
do I then need to reorder the test set to match this order before running? (Ignoring the fact that I need to remove the already predicted y value (a), so lets just say B and C are out of order... )
Yes, you need to reorder them. Imagine a simpler case, Linear Regression. The algorithm will calculate the weights for each of the features, so for example if feature 1 is unimportant, it will get assigned a close to 0 weight.
If at prediction time the order is different, an important feature will be multiplied by this almost null weight, and the prediction will be totally off.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With