Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Should a pandas dataframe column be converted in some way before passing it to a scikit learn regressor?

I have a pandas dataframe and passing df[list_of_columns] as X and df[[single_column]] as Y to a Random Forest regressor.

What does the following warnning mean and what should be done to resolve it?

DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().   probas = cfr.fit(trainset_X, trainset_Y).predict(testset_X)
like image 218
user2808117 Avatar asked Jan 01 '14 12:01

user2808117


People also ask

Can scikit-learn use pandas DataFrame?

Generally, scikit-learn works on any numeric data stored as numpy arrays or scipy sparse matrices. Other types that are convertible to numeric arrays such as pandas DataFrame are also acceptable.

How do you convert a column into a DataFrame?

The best way to convert one or more columns of a DataFrame to numeric values is to use pandas. to_numeric() . This function will try to change non-numeric objects (such as strings) into integers or floating-point numbers as appropriate.

What does reshape do in pandas?

In Pandas data reshaping means the transformation of the structure of a table or vector (i.e. DataFrame or Series) to make it suitable for further analysis. Some of Pandas reshaping capabilities do not readily exist in other environments (e.g. SQL or bare bone R) and can be tricky for a beginner.


1 Answers

Simply check the shape of your Y variable, it should be a one-dimensional object, and you are probably passing something with more (possibly trivial) dimensions. Reshape it to the form of list/1d array.

like image 116
lejlot Avatar answered Sep 25 '22 01:09

lejlot