sklearn's PCA.fit_transform results don't match product PCA.components_ and input data

Question

I'm attempting to use sklearn's PCA functionality to reduce my data to 2 dimensions. However, I noticed when I do this using the fit_transform() function the result does not match the result of multiplying the components_ attribute with my input data.

Why don't these match? Which result is correct?

def test_pca_fit_transform(self):
    from sklearn.decomposition import PCA
    input_data = np.matrix([[11,4,9,3,2,2], [7,2,8,2,0,2], [3,1,2,5,2,9]])
    #each column of input data is an observation, each row is a dimension

    #method1
    pca = PCA(n_components=2)
    data2d = pca.fit_transform(input_data.T)

    #method2
    component_matrix = np.matrix(pca.components_)
    data2d_mult = (component_matrix * input_data).T

    np.testing.assert_almost_equal(data2d, data2d_mult)
    #FAILS!!!

Imanol Luengo · Accepted Answer

The only step you are missing (which sklearn handles internally) is the data centering. In order to perform PCA your data needs to be centered, if its not, one of the first lines of sklearn's PCA's fit method is:

X -= X.mean(axis=0)

Which centers your data along the first axis.

In order to achieve the same result as sklearn (which is the correct one), you just need to center your data either before fit or before your method2.

Find here a working example:

X = np.array([[11,4,9,3,2,2], [7,2,8,2,0,2], [3,1,2,5,2,9]])
X = X.T.copy()

# PCA
pca = PCA(n_components=2)
data = pca.fit_transform(X)

# Your method 2
data2 = X.dot(pca.components_.T)

# Centering the data before method 2
data3 = X - X.mean(axis=0)
data3 = data3.dot(pca.components_.T)

# Compare
print np.allclose(data, data2) # prints False
print np.allclose(data, data3) # prints True

Note that I use .dot on standard numpy arrays instead of * in numpy matrix as I prefer to avoid using matrix whenever possible, but the result is the same.

sklearn's PCA.fit_transform results don't match product PCA.components_ and input data

Tags:

numpy

scikit-learn

linear-algebra

pca

Selah

1 Answers

Imanol Luengo

Recent Activity

Donate For Us

sklearn's PCA.fit_transform results don't match product PCA.components_ and input data

Tags:

numpy

scikit-learn

linear-algebra

pca

Selah

1 Answers

Imanol Luengo

Related questions

Recent Activity

Donate For Us