Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get U, Sigma, V* matrix from Truncated SVD in scikit-learn

I am using truncated SVD from scikit-learn package.

In the definition of SVD, an original matrix A is approxmated as a product AUΣV* where U and V have orthonormal columns, and Σ is non-negative diagonal.

I need to get the U, Σ and V* matrices.

Looking at the source code here I found out that V* is stored in self.components_ field after calling fit_transform.

Is it possible to get U and Σ matrices?

My code:

import sklearn.decomposition as skd
import numpy as np

matrix = np.random.random((20,20))
trsvd = skd.TruncatedSVD(n_components=15)
transformed = trsvd.fit_transform(matrix)
VT = trsvd.components_
like image 232
Vektor88 Avatar asked Jul 20 '15 18:07

Vektor88


People also ask

What is the difference between SVD and truncated SVD?

Unlike regular SVDs, truncated SVD produces a factorization where the number of columns can be specified for a number of truncation. For example, given an n x n matrix, truncated SVD generates the matrices with the specified number of columns, whereas SVD outputs n columns of matrices.

What does TruncatedSVD return?

Returns the transformer object. Fit model to X and perform dimensionality reduction on X. Parameters: X{array-like, sparse matrix} of shape (n_samples, n_features)

What is the difference between truncated SVD and PCA?

TruncatedSVD is very similar to PCA , but differs in that the matrix does not need to be centered. When the columnwise (per-feature) means of are subtracted from the feature values, truncated SVD on the resulting matrix is equivalent to PCA.


3 Answers

Looking into the source via the link you provided, TruncatedSVD is basically a wrapper around sklearn.utils.extmath.randomized_svd; you can manually call this yourself like this:

from sklearn.utils.extmath import randomized_svd

U, Sigma, VT = randomized_svd(X, 
                              n_components=15,
                              n_iter=5,
                              random_state=None)
like image 113
maxymoo Avatar answered Oct 16 '22 20:10

maxymoo


One can use scipy.sparse.svds (for dense matrices you can use svd).

import numpy as np
from scipy.sparse.linalg import svds

matrix = np.random.random((20, 20))
num_components = 2
u, s, v = svds(matrix, k=num_components)
X = u.dot(np.diag(s))  # output of TruncatedSVD

If you're working with really big sparse matrices (perhaps your working with natural text), even scipy.sparse.svds might blow up your computer's RAM. In such cases, consider the sparsesvd package which uses SVDLIBC, and what gensim uses under-the-hood.

import numpy as np
from sparsesvd import sparsesvd


X = np.random.random((30, 30))
ut, s, vt = sparsesvd(X.tocsc(), k)
projected = (X * ut.T)/s
like image 37
Vektor88 Avatar answered Oct 16 '22 19:10

Vektor88


Just as a note:

svd.transform(X)

and

svd.fit_transform(X)

generate U * Sigma.

svd.singular_values_

generates Sigma in vector form.

svd.components_

generates VT. Maybe we can use

svd.transform(X).dot(np.linalg.inv(np.diag(svd.singular_values_)))

to get U because U * Sigma * Sigma ^ -1 = U * I = U.

like image 9
Yin Avatar answered Oct 16 '22 18:10

Yin