Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ScikitLearn regression: Design matrix X too big for regression. What do I do?

I have a matrix X that has something like 7000 columns and 38000 rows. Thus it is a numpy array with shape (38000, 7000).

I instantiated the model

model = RidgeCV(alphas = (0.001,0.01, 0.1, 1)

and then fitted it

model.fit(X, y)

where y is the response vector which is a numpy array with shape (38000,).

By running this I get a Memory Error.

How can I solve this?

My Idea

My first thought was to divide the matrix X "horizontally". By this I mean that I divide X into, say, two matrices with the same number of columns (thus keeping all the features) but with fewer rows. Then I fit the model each time for each of this submatrices? But I am afraid that this is really not equivalent to fitting the whole matrix..

Any ideas?

like image 638
Euler_Salter Avatar asked Jan 03 '23 09:01

Euler_Salter


1 Answers

It is a well known issue that can be address using out-of-core learning. By googling the term you will find several ways to address the problem.

For your specific problem, you have first to create a generator that will yield a row (or several of them) of your matrix and than using the partial_fit method of your algorithm.

Standard algorithms of scikit-learn use actually an exact computation of the solution, like sklearn.linear_model.LinearRegression or sklearn.linear_model.LinearRegression.RidgeCV. Other methods are based on batch learning and have a partial_fit methods like sklearn.linear_model.SGDRegressor, allowing to fit only a mini-batch. It is what you are looking for.

The process is: use the generator to yield a mini-batch, apply the partial_fit method, delete the mini-batch from the memory and get a new one.

However, as this method is stochastic and depends of the order of your data and your initialization of the weights, at the opposite of the solution given by the standard regression methods that can fit all the data in the memory. I won't enter into the details but look at gradient descent optimization to understand how it works (http://ruder.io/optimizing-gradient-descent/)

like image 132
Nathan Avatar answered Jan 05 '23 23:01

Nathan