Scikit learn: forget previous train data

Tags:

scikit-learn

In scikit learn I have a model (in my case a Linear model)

clf = linear_model.LinearRegression()

I can train this model with some data

clf.fit(x1,y1)

But if I call again fit it will continue training the model.

clf.fit(x2,y2)

Now clf is a model trained with both (x1,y1) and (x2,y2)

If I want to start training from 0, I can create again the model by redefining clf

clf = linear_model.LinearRegression()
clf.fit(x1,y1)
# save the model
# ...
clf = linear_model.LinearRegression()
clf.fit(x2,y2)

However I don't want to define clf again:

Basically the type of regressor is chosen before, something like:

if params.linear_algorithm == 'least_squares':
    clf = linear_model.LinearRegression()
elif params.linear_algorithm == 'ridge':
    clf = linear_model.Ridge()
elif params.linear_algorithm == 'lasso':
    clf = linear_model.Lasso()

So I don't want inside my train function to redefine clf with all the conditional block, instead I just want to take clf, clean it from previous trainings and reuse it to train another set of data.

Does clf have a method to clean what has learned so far, so when I call clf.fit(x2,y2) is only trained on this data?

EDIT: You guys are right, the training is overwriten everytime.

My problem is that I'm saving the model in a dictionary, and it just take the reference to clf, so each time clf is retrained all previous saves are changed.

Redefining clf everytime creates a new object so each save points now so a different model

Example

for i in range(3):
   # get the x and y
   # ...
   clf.fit(x,y)
   model[i] = clf

Any idea how to save every time a different model instead of pointing all model[i] to the same clf?

301

asked Jul 31 '18 12:07

Sembei Norimaki

2 Answers

Your assumption is wrong. According to the Scikit-Learn docs:

Calling fit() more than once will overwrite what was learned by any previous fit().

You can therefore use your code safely and it will achieve what you need.

answered Oct 02 '22 23:10

Michele Tonutti

I am pretty sure it overwrites any existing information from before. Scikit Learn docs specify that. Unless you use warm_start = True, fit() calls will overwrite existing data.

answered Oct 02 '22 23:10

Nikolas Pitsillos

Related questions
                            
                                How do I repair conda after a system crash?
                            
                                vertical line tkinter using grid
                            
                                Extracting groups in a regex match
                            
                                How can I remove duplicate tuples from a list based on index value of tuple while maintaining the order of tuple? [duplicate]
                            
                                Telegram bot initiate conversation with a user
                            
                                Heroku fails to install pywin32 library
                            
                                Django - Override model save()
                            
                                How do I convert this complex SQL into a Django model query?
                            
                                Conditional Styling in Pandas using other columns
                            
                                Simulating argparse command line arguments input while debugging
                            
                                jinja2.exceptions.TemplateSyntaxError: expected token ',', got 'static'
                            
                                remove entries with nan values in python dictionary
                            
                                Python Program Won't Run - psycopg2 rename warning
                            
                                Django test parallel AppRegistryNotReady
                            
                                Splitting pandas dataframe column (into two) after the first letter in the cell
                            
                                Using pandas applymap() with multiple mapping functions
                            
                                VS Code: Tell pylint to ignore the next line?
                            
                                Pandas Python : how to create multiple columns from a list
                            
                                Splitting a string based on a pattern in Python
                            
                                Does Python scoping rule fits the definition of lexical scoping?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With