PCA memory error in Sklearn: Alternative Dim Reduction?

Question

I am trying to reduce the dimensionality of a very large matrix using PCA in Sklearn, but it produces a memory error (RAM required exceeds 128GB). I have already set copy=False and I'm using the less computationally expensive randomised PCA.

Is there a workaround? If not, what other dim reduction techniques could I use that require less memory. Thank you.

Update: the matrix I am trying to PCA is a set of feature vectors. It comes from passing a set of training images through a pretrained CNN. The matrix is [300000, 51200]. PCA components tried: 100 to 500.

I want to reduce its dimensionality so I can use these features to train an ML algo, such as XGBoost. Thank you.

Vivek Puurkayastha · Accepted Answer

You Could use IncrementalPCA available in SK learn. from sklearn.decomposition import IncrementalPCA. Rest of the interface is same as PCA. You need to pass an extra argument batch_size, which needs to <= #components.

However, in case there is a need to apply a non linear version like KernelPCA there does not seem to be a support for the something similar. KernelPCA absolutely explodes in it's memory requirement, see this article about Non Linear Dimensionality Reduction on Wikipedia

Chris Parry · Answer

In the end, I used TruncatedSVD instead of PCA, which is capable of handling large matrices without memory issues:

from sklearn import decomposition

n_comp = 250
svd = decomposition.TruncatedSVD(n_components=n_comp, algorithm='arpack')
svd.fit(train_features)
print(svd.explained_variance_ratio_.sum())

train_features = svd.transform(train_features)
test_features = svd.transform(test_features)

PCA memory error in Sklearn: Alternative Dim Reduction?

Tags:

python

multidimensional-array

scikit-learn

pca

Chris Parry

2 Answers

Vivek Puurkayastha

Chris Parry

Recent Activity

Donate For Us

PCA memory error in Sklearn: Alternative Dim Reduction?

Tags:

python

multidimensional-array

scikit-learn

pca

Chris Parry

2 Answers

Vivek Puurkayastha

Chris Parry

Related questions

Recent Activity

Donate For Us