Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Error with Sklearn Random Forest Regressor

When trying to fit a Random Forest Regressor model with y data that looks like this:

[  0.00000000e+00   1.36094276e+02   4.46608221e+03   8.72660888e+03
   1.31375786e+04   1.73580193e+04   2.29420671e+04   3.12216341e+04
   4.11395711e+04   5.07972062e+04   6.14904935e+04   7.34275322e+04
   7.87333933e+04   8.46302456e+04   9.71074959e+04   1.07146672e+05
   1.17187952e+05   1.26953374e+05   1.37736003e+05   1.47239359e+05
   1.53943242e+05   1.78806710e+05   1.92657725e+05   2.08912711e+05
   2.22855152e+05   2.34532982e+05   2.41391255e+05   2.48699216e+05
   2.62421197e+05   2.79544300e+05   2.95550971e+05   3.13524275e+05
   3.23365158e+05   3.24069067e+05   3.24472999e+05   3.24804951e+05

And X data that looks like this:

[ 735233.27082176  735234.27082176  735235.27082176  735236.27082176
  735237.27082176  735238.27082176  735239.27082176  735240.27082176
  735241.27082176  735242.27082176  735243.27082176  735244.27082176
  735245.27082176  735246.27082176  735247.27082176  735248.27082176

With the following code:

regressor = RandomForestRegressor(n_estimators=150, min_samples_split=1)
rgr = regressor.fit(X,y) 

I get this error:

ValueError: Number of labels=600 does not match number of samples=1

I assume one of my sets of values is in the wrong format but its not too clear to me from the documentation.

like image 471
BLL27 Avatar asked Aug 25 '15 07:08

BLL27


People also ask

Why random forest is not suitable for regression?

In other words, in a regression problem, the range of predictions a Random Forest can make is bound by the highest and lowest labels in the training data. This behavior becomes problematic in situations where the training and prediction inputs differ in their range and/or distributions.

What is OOB error in random forest?

The out-of-bag (OOB) error is the average error for each calculated using predictions from the trees that do not contain in their respective bootstrap sample. This allows the RandomForestClassifier to be fit and validated whilst being trained [1].


2 Answers

The shape of X should be [n_samples, n_features], you can transform X by

X = X[:, None]
like image 143
yangjie Avatar answered Oct 07 '22 18:10

yangjie


It is treating your list of samples X as 1 sample as a vector so the following works

rgr = regressor.fit(map(lambda x: [x],X),y)

There might be a more efficient way of doing this in numpy with vstack.

like image 43
Francisco Vargas Avatar answered Oct 07 '22 18:10

Francisco Vargas