I am getting different results when Randomized PCA with sparse and dense matrices:
import numpy as np
import scipy.sparse as scsp
from sklearn.decomposition import RandomizedPCA
x = np.matrix([[1,2,3,2,0,0,0,0],
[2,3,1,0,0,0,0,3],
[1,0,0,0,2,3,2,0],
[3,0,0,0,4,5,6,0],
[0,0,4,0,0,5,6,7],
[0,6,4,5,6,0,0,0],
[7,0,5,0,7,9,0,0]])
csr_x = scsp.csr_matrix(x)
s_pca = RandomizedPCA(n_components=2)
s_pca_scores = s_pca.fit_transform(csr_x)
s_pca_weights = s_pca.explained_variance_ratio_
d_pca = RandomizedPCA(n_components=2)
d_pca_scores = s_pca.fit_transform(x)
d_pca_weights = s_pca.explained_variance_ratio_
print 'sparse matrix scores {}'.format(s_pca_scores)
print 'dense matrix scores {}'.format(d_pca_scores)
print 'sparse matrix weights {}'.format(s_pca_weights)
print 'dense matrix weights {}'.format(d_pca_weights)
Result:
sparse matrix scores [[ 1.90912166 2.37266113]
[ 1.98826835 0.67329466]
[ 3.71153199 -1.00492408]
[ 7.76361811 -2.60901625]
[ 7.39263662 -5.8950472 ]
[ 5.58268666 7.97259172]
[ 13.19312194 1.30282165]]
dense matrix scores [[-4.23432815 0.43110596]
[-3.87576857 -1.36999888]
[-0.05168291 -1.02612363]
[ 3.66039297 -1.38544473]
[ 1.48948352 -7.0723618 ]
[-4.97601287 5.49128164]
[ 7.98791603 4.93154146]]
sparse matrix weights [ 0.74988508 0.25011492]
dense matrix weights [ 0.55596761 0.44403239]
The dense version gives the results with normal PCA, but what is going on when the matrix is sparse? Why are results different?
In the case of the sparse data, the RandomizedPCA does not center the data (mean removal) as it might blow up the memory usage. That probably explains what you observe.
I agree this "feature" is poorly documented. Please feel free to report an issue on github to track it and improve the doc.
Edit: we fixed that discrepancy in scikit-learn 0.15: RandomizedPCA is not deprecated for sparse data. Instead use TruncatedSVD that does the same as PCA without trying to center the data.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With