Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python statsmodels OLS: how to save learned model to file

I am trying to learn an ordinary least squares model using Python's statsmodels library, as described here.

sm.OLS.fit() returns the learned model. Is there a way to save it to the file and reload it? My training data is huge and it takes around half a minute to learn the model. So I was wondering if any save/load capability exists in OLS model.

I tried the repr() method on the model object but it does not return any useful information.

like image 777
Nik Avatar asked May 07 '13 08:05

Nik


People also ask

Is statsmodels better than Sklearn?

Both libraries have their uses. Before selecting one over the other, it is best to consider the purpose of the model. A model designed for prediction is best fit using scikit-learn, while statsmodels is best employed for explanatory models.

What does SM OLS do in Python?

The sm. OLS method takes two array-like objects a and b as input. a is generally a Pandas dataframe or a NumPy array. The shape of a is o*c , where o is the number of observations and c is the number of columns.

What is T in statsmodels?

t is the value of the t-statistic for testing if the corresponding coefficient is different from 0. ( Hypothesis H0: coef==0, H1: coef!=0) Pr>|t| is the p-value for this hypothesis test. A low p-value means, that you can reject the null-hypothesis and accept the alternative hypothesis (coef!=


2 Answers

The models and results instances all have a save and load method, so you don't need to use the pickle module directly.

Edit to add an example:

import statsmodels.api as sm

data = sm.datasets.longley.load_pandas()

data.exog['constant'] = 1

results = sm.OLS(data.endog, data.exog).fit()
results.save("longley_results.pickle")

# we should probably add a generic load to the main namespace
from statsmodels.regression.linear_model import OLSResults
new_results = OLSResults.load("longley_results.pickle")

# or more generally
from statsmodels.iolib.smpickle import load_pickle
new_results = load_pickle("longley_results.pickle")

Edit 2 We've now added a load method to main statsmodels API in master, so you can just do

new_results = sm.load('longley_results.pickle')
like image 90
jseabold Avatar answered Oct 26 '22 04:10

jseabold


I've installed the statsmodels library and found that you can save the values using the pickle module in python.

Models and results are pickleable via save/load, optionally saving the model data. [source]

As an example:

Given that you have the results saved in the variable results:

To save the file:

import pickle    
with open('learned_model.pkl','w') as f:
  pickle.dump(results,f)

To read the file:

import pickle
with open('learned_model.pkl','r') as f:
  model_results = pickle.load(f)
like image 34
RMcG Avatar answered Oct 26 '22 04:10

RMcG