Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Building multi-regression model throws error: `Pandas data cast to numpy dtype of object. Check input data with np.asarray(data).`

I have pandas dataframe with some categorical predictors (i.e. variables) as 0 & 1, and some numeric variables. When I fit that to a stasmodel like:

est = sm.OLS(y, X).fit() 

It throws:

Pandas data cast to numpy dtype of object. Check input data with np.asarray(data).  

I converted all the dtypes of the DataFrame using df.convert_objects(convert_numeric=True)

After this all dtypes of dataframe variables appear as int32 or int64. But at the end it still shows dtype: object, like this:

4516        int32 4523        int32 4525        int32 4531        int32 4533        int32 4542        int32 4562        int32 sex         int64 race        int64 dispstd     int64 age_days    int64 dtype: object 

Here 4516, 4523 are variable labels.

Any idea? I need to build a multi-regression model on more than hundreds of variables. For that I have concatenated 3 pandas DataFrames to come up with final DataFrame to be used in model building.

like image 200
Sanoj Avatar asked Nov 20 '15 18:11

Sanoj


2 Answers

If X is your dataframe, try using the .astype method to convert to float when running the model:

est = sm.OLS(y, X.astype(float)).fit() 
like image 162
Daniel Gibson Avatar answered Sep 21 '22 19:09

Daniel Gibson


if both y(dependent) and X are taken from a data frame then type cast both:-

est = sm.OLS(y.astype(float), X.astype(float)).fit() 
like image 22
kratant adhaulia Avatar answered Sep 21 '22 19:09

kratant adhaulia