Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A column-vector y was passed when a 1d array was expected

I need to fit RandomForestRegressor from sklearn.ensemble.

forest = ensemble.RandomForestRegressor(**RF_tuned_parameters) model = forest.fit(train_fold, train_y) yhat = model.predict(test_fold) 

This code always worked until I made some preprocessing of data (train_y). The error message says:

DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().

model = forest.fit(train_fold, train_y)

Previously train_y was a Series, now it's numpy array (it is a column-vector). If I apply train_y.ravel(), then it becomes a row vector and no error message appears, through the prediction step takes very long time (actually it never finishes...).

In the docs of RandomForestRegressor I found that train_y should be defined as y : array-like, shape = [n_samples] or [n_samples, n_outputs] Any idea how to solve this issue?

like image 509
Klausos Klausos Avatar asked Dec 08 '15 20:12

Klausos Klausos


2 Answers

Change this line:

model = forest.fit(train_fold, train_y) 

to:

model = forest.fit(train_fold, train_y.values.ravel()) 

Explanation:

.values will give the values in a numpy array (shape: (n,1))

.ravel will convert that array shape to (n, ) (i.e. flatten it)

like image 126
Linda MacPhee-Cobb Avatar answered Oct 02 '22 10:10

Linda MacPhee-Cobb


I also encountered this situation when I was trying to train a KNN classifier. but it seems that the warning was gone after I changed:
knn.fit(X_train,y_train)
to
knn.fit(X_train, np.ravel(y_train,order='C'))

Ahead of this line I used import numpy as np.

like image 41
Simon Leung Avatar answered Oct 02 '22 09:10

Simon Leung