sklearn linear regression for large data

Tags:

Does sklearn.LinearRegression support online/incremental learning?

I have 100 groups of data, and I am trying to implement them altogether. For each group, there are over 10000 instances and ~ 10 features, so it will lead to memory error with sklearn if I construct a huge matrix (10^6 by 10). It will be nice if I can update the regressor each time with batch samples of new group.

I found this post relevant, but the accepted solution works for online learning with single new data (only one instance) rather than batch samples.

965

asked Mar 26 '14 17:03

ChuNan

2 Answers

Take a look at linear_model.SGDRegressor, it learns a a linear model using stochastic gradient.

In general, sklearn has many models that admit "partial_fit", they are all pretty useful on medium to large datasets that don't fit in the RAM.

159

answered Sep 19 '22 21:09

Yanshuai Cao

Not all algorithms can learn incrementally, without seeing all of the instances at once that is. That said, all estimators implementing the partial_fit API are candidates for the mini-batch learning, also known as "online learning".

Here is an article that goes over scaling strategies for incremental learning. For your purposes, have a look at the sklearn.linear_model.SGDRegressor class. It is truly online so the memory and convergence rate are not affected by the batch size.

answered Sep 18 '22 21:09

Drewness

Related questions
                            
                                Custom decorator in flask not working?
                            
                                Is list join really faster than string concatenation in python?
                            
                                Align checkable items in qTableWidget
                            
                                What does the asterisk do in Python other than multiplication and exponentiation? [duplicate]
                            
                                How to convert tuple in string to tuple object?
                            
                                C struct python equivalent [duplicate]
                            
                                pyQt4 - How to select table rows and disable editing cells
                            
                                Can anyone explain why this sorting won't work?
                            
                                How to "convert" a dequed object to string in Python?
                            
                                Performing POST on a URL string in Django
                            
                                Format and print list of tuples as one line
                            
                                How to create a dictionary of dictionaries of dictionaries in Python
                            
                                Executing a vbs file with arguments created by python
                            
                                'module' object has no attribute 'GeoSQLCompiler'
                            
                                Removing borders from an image in Python
                            
                                "NameError: name 'Float' is not defined" in sqlalchemy
                            
                                Getting the Max Value from a Dictionary [duplicate]
                            
                                Python: How to get multiple variables from a URL in Flask?
                            
                                Python not finding file in the same directory
                            
                                Saving dictionary of header information using numpy.savez()

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

sklearn linear regression for large data

Tags:

python

scikit-learn

linear-regression

regression

ChuNan

People also ask

2 Answers

Yanshuai Cao

Drewness

Recent Activity

Donate For Us