I'm trying to run this code: (credit goes to Greg)
import pandas as pd
from sklearn.model_selection import train_test_split
import statsmodels.api as sm
quality = pd.read_csv("https://courses.edx.org/c4x/MITx/15.071x/asset/quality.csv")
train, test = train_test_split(quality, train_size=0.75, random_state=1)
qualityTrain = pd.DataFrame(train, columns=quality.columns)
qualityTest = pd.DataFrame(test, columns=quality.columns)
qualityTrain['PoorCare'] = qualityTrain['PoorCare'].astype(int)
cols = ['OfficeVisits', 'Narcotics']
x = qualityTrain[cols]
x = sm.add_constant(x)
y = qualityTrain['PoorCare']
model = sm.Logit(y, x).fit()
model.summary()
But I'm getting:
AttributeError: 'int' object has no attribute 'exp'
on the second to last line. This is clearly introduced by sampling the data (train_test_split), because the model fits just fine on the whole unmodified dataset.
How to fix this?
Just convert the x variable to floats:
model = sm.Logit(y, x.astype(float)).fit()
I get the following result:
<class 'statsmodels.iolib.summary.Summary'>
"""
Logit Regression Results
==============================================================================
Dep. Variable: PoorCare No. Observations: 98
Model: Logit Df Residuals: 95
Method: MLE Df Model: 2
Date: Mon, 23 Mar 2015 Pseudo R-squ.: 0.2390
Time: 16:45:51 Log-Likelihood: -39.714
converged: True LL-Null: -52.188
LLR p-value: 3.823e-06
================================================================================
coef std err z P>|z| [95.0% Conf. Int.]
--------------------------------------------------------------------------------
const -2.7718 0.561 -4.940 0.000 -3.872 -1.672
OfficeVisits 0.0680 0.031 2.211 0.027 0.008 0.128
Narcotics 0.1223 0.041 2.991 0.003 0.042 0.203
================================================================================
"""
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With