I calculated a model using OLS (multiple linear regression). I divided my data to train and test (half each), and then I would like to predict values for the 2nd half of the labels.
model = OLS(labels[:half], data[:half])
predictions = model.predict(data[half:])
The problem is that I get and error: File "/usr/local/lib/python2.7/dist-packages/statsmodels-0.5.0-py2.7-linux-i686.egg/statsmodels/regression/linear_model.py", line 281, in predict return np.dot(exog, params) ValueError: matrices are not aligned
I have the following array shapes: data.shape: (426, 215) labels.shape: (426,)
If I transpose the input to model.predict, I do get a result but with a shape of (426,213), so I suppose its wrong as well (I expect one vector of 213 numbers as label predictions):
model.predict(data[half:].T)
Any idea how to get it to work?
You can also call get_prediction
method of the Results
object to get the prediction together with its error estimate and confidence intervals.
Example:
import numpy as np
import statsmodels.api as sm
X = np.array([0, 1, 2, 3])
y = np.array([1, 2, 3.5, 4])
X = sm.add_constant(X)
model = sm.OLS(y, X)
results = model.fit()
predict:
# Predict at x=2.5
X_test = np.array([1, 2.5]) # "1" refers to the intercept term
results.get_prediction(X_test).summary_frame(alpha=0.05) # alpha = significance level for confidence interval
gives:
mean mean_se mean_ci_lower mean_ci_upper obs_ci_lower obs_ci_upper
0 3.675 0.198431 2.821219 4.528781 2.142416 5.207584
where mean_ci
refers to the confidence interval and obs_ci
refers to the prediction interval.
For statsmodels >=0.4, if I remember correctly
model.predict
doesn't know about the parameters, and requires them in the call
see http://statsmodels.sourceforge.net/stable/generated/statsmodels.regression.linear_model.OLS.predict.html
What should work in your case is to fit the model and then use the predict method of the results instance.
model = OLS(labels[:half], data[:half])
results = model.fit()
predictions = results.predict(data[half:])
or shorter
results = OLS(labels[:half], data[:half]).fit()
predictions = results.predict(data[half:])
http://statsmodels.sourceforge.net/stable/generated/statsmodels.regression.linear_model.RegressionResults.predict.html with missing docstring
Note: this has been changed in the development version (backwards compatible), that can take advantage of "formula" information in predict http://statsmodels.sourceforge.net/devel/generated/statsmodels.regression.linear_model.RegressionResults.predict.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With