How does Scikit-Learn's .fit() method pass data to .predict()?

Question

I'm trying to understand the relationship between sklearn's .fit() method and the .predict() method; mainly, how exactly is data (typically) being passed from one to the other. I haven't found another question on SO that's addressed this, but have danced around it (i.e. here)

I've written a custom estimator, using the BaseEstimator and RegressorMixin classes, but have run into a 'NotFittedError' a handful of times as I've begun running my data through it. Could someone walk me through a simple linear regression and how the data is passed through the fit and predict methods? No need to get into the math - I understand how regressions work and what the pieces of the puzzle do. Maybe I'm overlooking the obvious and making it more complicated than it shoudld be? But the estimator methods are feeling like a bit of a black box.

Matthew Barlowe · Accepted Answer

NotFittedError happens when you try to use the .predict() method of your classifier before you have trained or used the .fit() method.

Lets take for example the LinearRegression from scikit learn.

>>> import numpy as np
>>> from sklearn.linear_model import LinearRegression
>>> X = np.array([[1, 1], [1, 2], [2, 2], [2, 3]])
>>> # y = 1 * x_0 + 2 * x_1 + 3
>>> y = np.dot(X, np.array([1, 2])) + 3
>>> reg = LinearRegression().fit(X, y)
>>> reg.score(X, y)
1.0
>>> reg.coef_
array([1., 2.])
>>> reg.intercept_ 
3.0000...
>>> reg.predict(np.array([[3, 5]]))
array([16.])

so with the line reg = LinearRegression().fit(X, y) you are instantiating the class LinearRegression and then fit it to your data X and y where X the independent variables and y your dependent. Once the model is trained inside that class the beta coefficients for the linear regression is saved in the class attribute coef_ and you could access it using reg.coef_. That's how the class knows to predict when you use the .predict() class method. The class accesses those coefficients and then its just simple algebra to produce a prediction.

So back to your error. If you aren't fitting the model to your training data then the class doesn't have the necessary attributes needed to make the predictions. Hopefully that clears up some confusion on whats going on inside the class at least with regards to how the fit() and predict() methods interact.

Ultimately like commented above this goes back to the fundamentals of Object Oriented Programming so if you wanted to learn further I would read about how Python handles Classes as scikit learn models follow the same behavior

mujjiga · Answer

Lets look at a toy Estimator doing the LinearRegression

from sklearn.base import TransformerMixin, BaseEstimator
import numpy as np

class ToyEstimator(BaseEstimator):
    def __init__(self):
        pass

    def fit(self, X, y):
        X = np.hstack((X,np.ones((len(X),1))))
        self.W = np.random.randn(X.shape[1])

        self.W = np.dot(np.dot(np.linalg.inv(np.dot(X.T,X)), X.T), y)
        self.coef_ = self.W[:-1]
        self.intercept_ = self.W[-1]
        return self


    def transform(self, X):
        X = np.hstack((X,np.ones((len(X),1))))
        return np.dot(X,self.W)

X = np.random.randn(10,3)
y = X[:,0]*1.11+X[:,1]*2.22+X[:,2]*3.33+4.44

reg = ToyEstimator()
reg.fit(X,y)
y_ = reg.transform(X)
print (reg.coef_, reg.intercept_)

Output:

[1.11 2.22 3.33] 4.4399999999999995

So what did the above code do ?

In fit we fit rain the weights using the training data. These weights are member variables of the class [this is something which you learn in OOPs]
The transform method makes a prediction on the data using the trained weighs which are stored as member variables.

So before calling transform you need to call fit because transform uses the weights which are calculated during fit.

In sklearn modules if you call a transform before fit you get a NotFittedError exception.

How does Scikit-Learn's .fit() method pass data to .predict()?

Tags:

python

machine-learning

scikit-learn

alofgran

2 Answers

Matthew Barlowe

mujjiga

Recent Activity

Donate For Us

How does Scikit-Learn's .fit() method pass data to .predict()?

Tags:

python

machine-learning

scikit-learn

alofgran

2 Answers

Matthew Barlowe

mujjiga

Related questions

Recent Activity

Donate For Us