Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

fit method in python sklearn

I am asking myself various questions about the fit method in sklearn.

Question 1: when I do:

from sklearn.decomposition import TruncatedSVD
model = TruncatedSVD()
svd_1 = model.fit(X1)
svd_2 = model.fit(X2)

Is the content of the variable model changing whatsoever during the process?

Question 2: when I do:

from sklearn.decomposition import TruncatedSVD
model = TruncatedSVD()
svd_1 = model.fit(X1)
svd_2 = svd_1.fit(X2)

What is happening to svd_1? In other words, svd_1 has already been fitted and I fit it again, so what is happenning to its component?

like image 717
sweeeeeet Avatar asked Jan 11 '16 17:01

sweeeeeet


People also ask

What is model fit () in Python?

model. fit() : fit training data. For supervised learning applications, this accepts two arguments: the data X and the labels y (e.g. model. fit(X, y) ).

What is the use of fit ()?

fit() method will calculate the mean (µ) and the standard deviation (σ) of the particular feature F. We can use these parameters later for analysis. Let's use the pre-processing transformer known as StandardScaler as an example and assume that we have to scale the features of self-created data.

What does fit () do in regression?

It finds the coefficients for the equation specified via the algorithm being used (take for example umutto's linear regression example, above).

What is difference between fit () Transform () and Fit_transform ()?

The fit(data) method is used to compute the mean and std dev for a given feature to be used further for scaling. The transform(data) method is used to perform scaling using mean and std dev calculated using the . fit() method. The fit_transform() method does both fits and transform.


2 Answers

Question 1: Is the content of the variable model changing whatsoever during the process?

Yes. The fit method modifies the object. And it returns a reference to the object. Thus, take care! In the first example all three variables model, svd_1, and svd_2 actually refer to the same object.

from sklearn.decomposition import TruncatedSVD
model = TruncatedSVD()
svd_1 = model.fit(X1)
svd_2 = model.fit(X2)
print(model is svd_1 is svd_2)  # prints True

Question 2: What is happening to svd_1?

model and svd_1 refer to the same object, so there is absolutely no difference between the first and the second example.

Final Remark: What happens in both examples is that the result of fit(X1) is overwritten by fit(X2), as pointed out in the answer by David Maust. If you want to have two different models fitted to two different sets of data you need to do something like this:

svd_1 = TruncatedSVD().fit(X1)
svd_2 = TruncatedSVD().fit(X2)
like image 114
MB-F Avatar answered Oct 06 '22 23:10

MB-F


When you call fit on TruncatedSVD. It will replace the components with those built from the new matrix. Some estimators and transformers in scikit-learn like IncrementalPCA have a partial_fit which will incrementally build a model by adding additional data.

like image 38
David Maust Avatar answered Oct 06 '22 22:10

David Maust