I have pandas dataframe with some categorical predictors (i.e. variables) as 0 & 1, and some numeric variables. When I fit that to a stasmodel like:
est = sm.OLS(y, X).fit()
It throws:
Pandas data cast to numpy dtype of object. Check input data with np.asarray(data).
I converted all the dtypes of the DataFrame using df.convert_objects(convert_numeric=True)
After this all dtypes of dataframe variables appear as int32 or int64. But at the end it still shows dtype: object
, like this:
4516 int32 4523 int32 4525 int32 4531 int32 4533 int32 4542 int32 4562 int32 sex int64 race int64 dispstd int64 age_days int64 dtype: object
Here 4516, 4523 are variable labels.
Any idea? I need to build a multi-regression model on more than hundreds of variables. For that I have concatenated 3 pandas DataFrames to come up with final DataFrame to be used in model building.
If X is your dataframe, try using the .astype
method to convert to float when running the model:
est = sm.OLS(y, X.astype(float)).fit()
if both y(dependent) and X are taken from a data frame then type cast both:-
est = sm.OLS(y.astype(float), X.astype(float)).fit()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With