I am using truncated SVD from <code>scikit-learn</code> package. In the definition of SVD, an original matrix A is approxmated as a product A ≈ UΣV* where U and V have orthonormal columns, and Σ is non-negative diagonal. I need to get the U, Σ and V* matrices. Looking at the source code here I found out that V* is stored in <code>self.components_</code> field after calling <code>fit_transform</code>. Is it possible to get U and Σ matrices? My code: <pre class="prettyprint"><code>import sklearn.decomposition as skd import numpy as np matrix = np.random.random((20,20)) trsvd = skd.TruncatedSVD(n_components=15) transformed = trsvd.fit_transform(matrix) VT = trsvd.components_ </code></pre>

Looking into the source via the link you provided, <code>TruncatedSVD</code> is basically a wrapper around sklearn.utils.extmath.randomized_svd; you can manually call this yourself like this: <pre class="prettyprint"><code>from sklearn.utils.extmath import randomized_svd U, Sigma, VT = randomized_svd(X, n_components=15, n_iter=5, random_state=None) </code></pre>

One can use scipy.sparse.svds (for dense matrices you can use svd). <pre class="prettyprint"><code>import numpy as np from scipy.sparse.linalg import svds matrix = np.random.random((20, 20)) num_components = 2 u, s, v = svds(matrix, k=num_components) X = u.dot(np.diag(s)) # output of TruncatedSVD </code></pre> If you're working with really big sparse matrices (perhaps your working with natural text), even <code>scipy.sparse.svds</code> might blow up your computer's RAM. In such cases, consider the sparsesvd package which uses SVDLIBC, and what <code>gensim</code> uses under-the-hood. <pre class="prettyprint"><code>import numpy as np from sparsesvd import sparsesvd X = np.random.random((30, 30)) ut, s, vt = sparsesvd(X.tocsc(), k) projected = (X * ut.T)/s </code></pre>

Just as a note: <pre class="prettyprint"><code>svd.transform(X) </code></pre> and <pre class="prettyprint"><code>svd.fit_transform(X) </code></pre> generate U * Sigma. <pre class="prettyprint"><code>svd.singular_values_ </code></pre> generates Sigma in vector form. <pre class="prettyprint"><code>svd.components_ </code></pre> generates VT. Maybe we can use <pre class="prettyprint"><code>svd.transform(X).dot(np.linalg.inv(np.diag(svd.singular_values_))) </code></pre> to get U because U * Sigma * Sigma ^ -1 = U * I = U.

Get U, Sigma, V* matrix from Truncated SVD in scikit-learn

Tags:

python

scipy

scikit-learn

sparse-matrix

svd

I am using truncated SVD from scikit-learn package.

In the definition of SVD, an original matrix A is approxmated as a product A ≈ UΣV* where U and V have orthonormal columns, and Σ is non-negative diagonal.

I need to get the U, Σ and V* matrices.

Looking at the source code here I found out that V* is stored in self.components_ field after calling fit_transform.

Is it possible to get U and Σ matrices?

My code:

import sklearn.decomposition as skd
import numpy as np

matrix = np.random.random((20,20))
trsvd = skd.TruncatedSVD(n_components=15)
transformed = trsvd.fit_transform(matrix)
VT = trsvd.components_

232

asked Jul 20 '15 18:07

Vektor88

3 Answers

Looking into the source via the link you provided, TruncatedSVD is basically a wrapper around sklearn.utils.extmath.randomized_svd; you can manually call this yourself like this:

from sklearn.utils.extmath import randomized_svd

U, Sigma, VT = randomized_svd(X, 
                              n_components=15,
                              n_iter=5,
                              random_state=None)

113

answered Oct 16 '22 20:10

maxymoo

One can use scipy.sparse.svds (for dense matrices you can use svd).

import numpy as np
from scipy.sparse.linalg import svds

matrix = np.random.random((20, 20))
num_components = 2
u, s, v = svds(matrix, k=num_components)
X = u.dot(np.diag(s))  # output of TruncatedSVD

If you're working with really big sparse matrices (perhaps your working with natural text), even scipy.sparse.svds might blow up your computer's RAM. In such cases, consider the sparsesvd package which uses SVDLIBC, and what gensim uses under-the-hood.

import numpy as np
from sparsesvd import sparsesvd


X = np.random.random((30, 30))
ut, s, vt = sparsesvd(X.tocsc(), k)
projected = (X * ut.T)/s

answered Oct 16 '22 19:10

Vektor88

Just as a note:

svd.transform(X)

and

svd.fit_transform(X)

generate U * Sigma.

svd.singular_values_

generates Sigma in vector form.

svd.components_

generates VT. Maybe we can use

svd.transform(X).dot(np.linalg.inv(np.diag(svd.singular_values_)))

to get U because U * Sigma * Sigma ^ -1 = U * I = U.

answered Oct 16 '22 18:10

Yin

Related questions
                            
                                Unable to parse TAB in JSON files
                            
                                Selecting last n columns and excluding last n columns in dataframe
                            
                                Print command line arguments with argparse?
                            
                                'PyDevTerminalInteractiveShell' object has no attribute 'has_readline'
                            
                                Django doesn't call model clean method
                            
                                Pandas Dataframe Find Rows Where all Columns Equal
                            
                                Python loop for inside lambda
                            
                                Dynamically create an enum with custom values in Python? [duplicate]
                            
                                Reading a csv with a timestamp column, with pandas
                            
                                PyCharm: Configuring multi-hop remote Interpreters via SSH
                            
                                Plotting implicit equations in 3d
                            
                                What's the most Pythonic way to identify consecutive duplicates in a list?
                            
                                Creating lambda inside a loop [duplicate]
                            
                                python re.split() to split by spaces, commas, and periods, but not in cases like 1,000 or 1.50
                            
                                How do __enter__ and __exit__ work in Python decorator classes?
                            
                                Is there any way to output requirements.txt automatically?
                            
                                Python inheritance - how to disable a function
                            
                                Python using methods from other classes
                            
                                How do I install pyspark for use in standalone scripts?
                            
                                Log in user using either email address or username in Django

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Get U, Sigma, V* matrix from Truncated SVD in scikit-learn

Tags:

python

scipy

scikit-learn

sparse-matrix

svd

Vektor88

People also ask

3 Answers

maxymoo

Vektor88

Yin

Recent Activity

Donate For Us