How to compare predictive power of PCA and NMF

Tags:

I would like to compare the output of an algorithm with different preprocessed data: NMF and PCA. In order to get somehow a comparable result, instead of choosing just the same number of components for each PCA and NMF, I would like to pick the amount that explains e.g 95% of retained variance.

I was wondering if its possible to identify the variance retained in each component of NMF.

For instance using PCA this would be given by: retainedVariance(i) = eigenvalue(i) / sum(eigenvalue)

Any ideas?

441

asked Jan 08 '18 10:01

Phil D

1 Answers

TL;DR

You should loop over different n_components and estimate explained_variance_score of the decoded X at each iteration. This will show you how many components do you need to explain 95% of variance.

Now I will explain why.

Relationship between PCA and NMF

NMF and PCA, as many other unsupervised learning algorithms, are aimed to do two things:

encode input X into a compressed representation H;
decode H back to X', which should be as close to X as possible.

They do it in a somehow similar way:

Decoding is similar in PCA and NMF: they output X' = dot(H, W), where W is a learned matrix parameter.
Encoding is different. In PCA, it is also linear: H = dot(X, V), where V is also a learned parameter. In NMF, H = argmin(loss(X, H, W)) (with respect to H only), where loss is mean squared error between X and dot(H, W), plus some additional penalties. Minimization is performed by coordinate descent, and result may be nonlinear in X.
Training is also different. PCA learns sequentially: the first component minimizes MSE without constraints, each next kth component minimizes residual MSE subject to being orthogonal with the previous components. NMF minimizes the same loss(X, H, W) as when encoding, but now with respect to both H and W.

How to measure performance of dimensionality reduction

If you want to measure performance of an encoding/decoding algorithm, you can follow the usual steps:

Train your encoder+decoder on X_train
To measure in-sample performance, compare X_train'=decode(encode(X_train)) with X_train using your preferred metric (e.g. MAE, RMSE, or explained variance)
To measure out-of-sample performance (generalizing ability) of your algorithm, do step 2 with the unseen X_test.

Let's try it with PCA and NMF!

from sklearn import decomposition, datasets, model_selection, preprocessing, metrics
# use the well-known Iris dataset
X, _ = datasets.load_iris(return_X_y=True)
# split the dataset, to measure overfitting
X_train, X_test = model_selection.train_test_split(X, test_size=0.5, random_state=1)
# I scale the data in order to give equal importance to all its dimensions
# NMF does not allow negative input, so I don't center the data
scaler = preprocessing.StandardScaler(with_mean=False).fit(X_train)
X_train_sc = scaler.transform(X_train)
X_test_sc = scaler.transform(X_test)
# train the both decomposers
pca = decomposition.PCA(n_components=2).fit(X_train_sc)
nmf = decomposition.NMF(n_components=2).fit(X_train_sc)
print(sum(pca.explained_variance_ratio_))

It will print you explained variance ratio of 0.9536930834362043 - the default metric of PCA, estimated using its eigenvalues. We can measure it in a more direct way - by applying a metric to actual and "predicted" values:

def get_score(model, data, scorer=metrics.explained_variance_score):
    """ Estimate performance of the model on the data """
    prediction = model.inverse_transform(model.transform(data))
    return scorer(data, prediction)

print('train set performance')
print(get_score(pca, X_train_sc))
print(get_score(nmf, X_train_sc))

print('test set performance')
print(get_score(pca, X_test_sc))
print(get_score(nmf, X_test_sc))

which gives

train set performance
0.9536930834362043 # same as before!
0.937291711378812 
test set performance
0.9597828443047842
0.9590555069007827

You can see that on the training set PCA performs better than NMF, but on the test set their performance is almost identical. This happens, because NMF applies lots of regularization:

H and W (the learned parameter) must be non-negative
H should be as small as possible (L1 and L2 penalties)
W should be as small as possible (L1 and L2 penalties)

These regularizations make NMF fit worse than possible to the training data, but they might improve its generalizing ability, which happened in our case.

How to choose the number of components

In PCA, it is simple, because its components h_1, h_2, ... h_k are learned sequentially. If you add the new component h_(k+1), the first k will not change. Thus, you can estimate performance of each component, and these estimates will not depent on the number of components. This makes it possible for PCA to output the explained_variance_ratio_ array after only a single fit to data.

NMF is more complex, because all its components are trained at the same time, and each one depends on all the rest. Thus, if you add the k+1th component, the first k components will change, and you cannot match each particular component with its explained variance (or any other metric).

But what you can to is to fit a new instance of NMF for each number of components, and compare the total explained variance:

ks = [1,2,3,4]
perfs_train = []
perfs_test = []
for k in ks:
    nmf = decomposition.NMF(n_components=k).fit(X_train_sc)
    perfs_train.append(get_score(nmf, X_train_sc))
    perfs_test.append(get_score(nmf, X_test_sc))
print(perfs_train)
print(perfs_test)

which would give

[0.3236945680665101, 0.937291711378812, 0.995459457205891, 0.9974027602663655]
[0.26186701106012833, 0.9590555069007827, 0.9941424954209546, 0.9968456603914185]

Thus, three components (judging by the train set performance) or two components (by the test set) are required to explain at least 95% of variance. Please notice that this case is unusual and caused by a small size of training and test data: usually performance degrades a little bit on the test set, but in my case it actually improved a little.

181

answered Sep 28 '22 19:09

David Dale

Related questions
                            
                                Memory efficient way to split large numpy array into train and test
                            
                                What does "n_features" and "centers" parameters mean in make_blobs in SciKit?
                            
                                Does KMeans normalize features automatically in sklearn
                            
                                Right order of doing feature selection, PCA and normalization?
                            
                                Use sklearn TfidfVectorizer with already tokenized inputs?
                            
                                Should a pandas dataframe column be converted in some way before passing it to a scikit learn regressor?
                            
                                scikit learn: train_test_split, can I ensure same splits on different datasets
                            
                                Why use log-probability estimates in GaussianNB [scikit-learn]?
                            
                                How to print result of clustering in sklearn
                            
                                Incremental PCA on big data
                            
                                sklearn Kfold acces single fold instead of for loop
                            
                                How to find out the accuracy?
                            
                                Access standardized residuals, cook's values, hatvalues (leverage) etc. easily in Python?
                            
                                List the words in a vocabulary according to occurrence in a text corpus, with Scikit-Learn CountVectorizer
                            
                                sklearn LinearRegression, why only one coefficient returned by the model?
                            
                                Python's Xgoost: ValueError('feature_names may not contain [, ] or <')
                            
                                How do I modify a library in python and use it? [closed]
                            
                                How to save a trained model by scikit-learn? [duplicate]
                            
                                In machine learning, what is definition of “downstream”?
                            
                                Train Model fails because 'list' object has no attribute 'lower'

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to compare predictive power of PCA and NMF

Tags:

scikit-learn

pca

dimensionality-reduction

matrix-factorization

nmf