I am using truncated SVD from scikit-learn
package.
In the definition of SVD, an original matrix A is approxmated as a product A ≈ UΣV* where U and V have orthonormal columns, and Σ is non-negative diagonal.
I need to get the U, Σ and V* matrices.
Looking at the source code here I found out that V* is stored in self.components_
field after calling fit_transform
.
Is it possible to get U and Σ matrices?
My code:
import sklearn.decomposition as skd
import numpy as np
matrix = np.random.random((20,20))
trsvd = skd.TruncatedSVD(n_components=15)
transformed = trsvd.fit_transform(matrix)
VT = trsvd.components_
Unlike regular SVDs, truncated SVD produces a factorization where the number of columns can be specified for a number of truncation. For example, given an n x n matrix, truncated SVD generates the matrices with the specified number of columns, whereas SVD outputs n columns of matrices.
Returns the transformer object. Fit model to X and perform dimensionality reduction on X. Parameters: X{array-like, sparse matrix} of shape (n_samples, n_features)
TruncatedSVD is very similar to PCA , but differs in that the matrix does not need to be centered. When the columnwise (per-feature) means of are subtracted from the feature values, truncated SVD on the resulting matrix is equivalent to PCA.
Looking into the source via the link you provided, TruncatedSVD
is basically a wrapper around sklearn.utils.extmath.randomized_svd; you can manually call this yourself like this:
from sklearn.utils.extmath import randomized_svd
U, Sigma, VT = randomized_svd(X,
n_components=15,
n_iter=5,
random_state=None)
One can use scipy.sparse.svds (for dense matrices you can use svd).
import numpy as np
from scipy.sparse.linalg import svds
matrix = np.random.random((20, 20))
num_components = 2
u, s, v = svds(matrix, k=num_components)
X = u.dot(np.diag(s)) # output of TruncatedSVD
If you're working with really big sparse matrices (perhaps your working with natural text), even scipy.sparse.svds
might blow up your computer's RAM. In such cases, consider the sparsesvd package which uses SVDLIBC, and what gensim
uses under-the-hood.
import numpy as np
from sparsesvd import sparsesvd
X = np.random.random((30, 30))
ut, s, vt = sparsesvd(X.tocsc(), k)
projected = (X * ut.T)/s
Just as a note:
svd.transform(X)
and
svd.fit_transform(X)
generate U * Sigma.
svd.singular_values_
generates Sigma in vector form.
svd.components_
generates VT. Maybe we can use
svd.transform(X).dot(np.linalg.inv(np.diag(svd.singular_values_)))
to get U because U * Sigma * Sigma ^ -1 = U * I = U.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With