How to fit a model to my testing set in statsmodels (python)

Tags:

I am working on a logistic regression model and I am having trouble understanding how to take the model fit from my training set onto my testing set. Sorry, I am new to python and VERY new to statsmodels..

import pandas as pd
import statsmodels.api as sm
from sklearn import cross_validation

independent_vars = phy_train.columns[3:]
X_train, X_test, y_train, y_test = cross_validation.train_test_split(phy_train[independent_vars], phy_train['target'], test_size=0.3, random_state=0)
X_train = pd.DataFrame(X_train)
X_train.columns = independent_vars
X_test = pd.DataFrame(X_test)
X_test.columns = independent_vars
y_train = pd.DataFrame(y_train)
y_train.columns = ['target']
y_test = pd.DataFrame(y_test)
y_test.columns = ['target']
logit = sm.Logit(y_train,X_train[subset],missing='drop')
result = logit.fit()
print result.summary()

y_pred = logit.predict(X_test[subset])

From the last line, I get this error:

y_pred = logit.predict(X_test[subset]) Traceback (most recent call last): File "", line 1, in File "C:\Users\eMachine\WinPython-64bit-2.7.5.3\python-2.7.5.amd64\lib\site-packages\statsmodels\discrete\discrete_model.py", line 378, in predict return self.cdf(np.dot(exog, params)) ValueError: matrices are not aligned

My training and testing data set have the same number of variables so I am sure I am misunderstanding what the logit.predict() is actually doing.

615

asked Apr 13 '14 21:04

statsNoob

1 Answers

There are two predict methods.

logit in your example is the model instance. The model instance doesn't know about the estimation results. The model predict has a different signature because it needs the parameters also logit.predict(params, exog). This is mainly interesting for internal usage.

What you want is the predict method of the results instance. In your example

y_pred = result.predict(X_test[subset])

should give the correct results. It uses the estimated parameters in the prediction with your new test data of explanatory variables, X_test.

Calling model.fit() returns an instance of a results class that provides access to additional post-estimation statistics and analysis, and to prediction.

199

answered Sep 23 '22 06:09

Josef

Related questions
                            
                                Accessing class variables via instance
                            
                                use slugify in template
                            
                                Python multiprocessing keyword arguments
                            
                                Check if a directory exists in a zip file with Python
                            
                                Handling directories with spaces Python subprocess.call()
                            
                                Python: How to check if a string is a valid IRI?
                            
                                Understanding pandas dataframe indexing
                            
                                what does this operator means in django `reduce(operator.and_, query_list)`
                            
                                What's the most pythonic way to iterate over all the lines of multiple files?
                            
                                Python: How to check for RSS updates with feedparser and etags
                            
                                How do I fix this "TypeError: 'str' object is not callable" error?
                            
                                How does one append large amounts of data to a Pandas HDFStore and get a natural unique index?
                            
                                How do I access embedded json objects in a Pandas DataFrame?
                            
                                how to differentiate tcp/udp when programming sockets
                            
                                Raspberry pi flashing LED issue - Python vs Java
                            
                                Building list of lists from CSV file
                            
                                Python can't define tuples in a function [duplicate]
                            
                                Factory Design Pattern
                            
                                What is the L in numpy.shape and 32 in numpy.type of an array?
                            
                                pandas to_csv: suppress scientific notation in csv file when writing pandas to csv

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to fit a model to my testing set in statsmodels (python)

Tags:

python

statsmodels

statsNoob

People also ask

1 Answers

Josef

Recent Activity

Donate For Us