Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scikitlearn - order of fit and predict inputs, does it matter?

Just getting started with this library... having some issues (i've read the docs but didn't get clarity) with RandomForestClassifiers

My question is pretty simple, say i have a train data set like

A B C

1 2 3

Where A is the independent variable (y) and B-C are the dependent variables (x). Let's say the test set looks the same, however the order is

B A C

1 2 3

When I call forest.fit(train_data[0:,1:],train_data[0:,0]) do I then need to reorder the test set to match this order before running? (Ignoring the fact that I need to remove the already predicted y value (a), so lets just say B and C are out of order... )

like image 480
Solaxun Avatar asked Mar 18 '23 04:03

Solaxun


1 Answers

Yes, you need to reorder them. Imagine a simpler case, Linear Regression. The algorithm will calculate the weights for each of the features, so for example if feature 1 is unimportant, it will get assigned a close to 0 weight.

If at prediction time the order is different, an important feature will be multiplied by this almost null weight, and the prediction will be totally off.

like image 189
elyase Avatar answered Mar 19 '23 17:03

elyase