How to interpret Singular Value Decomposition results (Python 3)?

Tags:

I'm trying to learn how to reduce dimensionality in datasets. I came across some tutorials on Principle Component Analysis and Singular Value Decomposition. I understand that it takes the dimension of greatest variance and sequentially collapses dimensions of the next highest variance (overly simplified).

I'm confused on how to interpret the output matrices. I looked at the documentation but it wasn't much help. I followed some tutorials and was not too sure what the resulting matrices were exactly. I provided some code to get a feel for the distribution of each variable in the dataset (sklearn.datasets) .

My initial input array is a (n x m) matrix of n samples and m attributes. I could do a common PCA plot of PC1 vs. PC2 but how do I know which dimensions each PC represents?

Sorry if this is a basic question. A lot of the resources are very math heavy which I'm fine with but a more intuitive answer would be useful. No where I've seen talks about how to interpret the output in terms of the original labeled data.

I'm open to using sklearn's decomposition.PCA

#Singular Value Decomposition
U, s, V = np.linalg.svd(X, full_matrices=True)
print(U.shape, s.shape, V.shape, sep="\n")
(442, 442)
(10,)
(10, 10)

457

asked Jun 10 '16 19:06

O.rka

1 Answers

As you stated above matrix M can decomposed as product ot 3 matrices: U * S * V^*. Geometrical sense is next: any transformation could be deemed as a sequence of rotation (V^*), scaling (S) and rotation again(U). Here's good description and animation.

What's important for us? Matrix S is diagonal - all its values lying off the main diagonal are 0.

Like:

np.diag(s)

array([[ 2.00604441,  0.        ,  0.        ,  0.        ,  0.        ,         0.        ,  0.        ,  0.        ,  0.        ,  0.        ],
       [ 0.        ,  1.22160478,  0.        ,  0.        ,  0.        ,         0.        ,  0.        ,  0.        ,  0.        ,  0.        ],
       [ 0.        ,  0.        ,  1.09816315,  0.        ,  0.        ,         0.        ,  0.        ,  0.        ,  0.        ,  0.        ],
       [ 0.        ,  0.        ,  0.        ,  0.97748473,  0.        ,         0.        ,  0.        ,  0.        ,  0.        ,  0.        ],
       [ 0.        ,  0.        ,  0.        ,  0.        ,  0.81374786,         0.        ,  0.        ,  0.        ,  0.        ,  0.        ],
       [ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ,         0.77634993,  0.        ,  0.        ,  0.        ,  0.        ],
       [ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ,         0.        ,  0.73250287,  0.        ,  0.        ,  0.        ],
       [ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ,         0.        ,  0.        ,  0.65854628,  0.        ,  0.        ],
       [ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ,         0.        ,  0.        ,  0.        ,  0.27985695,  0.        ],
       [ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ,         0.        ,  0.        ,  0.        ,  0.        ,  0.09252313]])

Geometrically - each value is a scaling factor along particular axis. For our purposes (Classification and Regression) these values show impact of particular axis to the overall result.

As you may see these values are decreasing from 2.0 to 0.093. One of the most important applications - easy Low-rank matrix approximation with given precision. If you do not need an ultra-precise decomposition (it is true for ML issues) you may throw off lowest values and keep only important. In such a way you may step-by-step refine your solution: estimate quality with test set, throw off least values and repeat. As a result you obtain easy and robust solution.

enter image description here

Here good candidates to be shrinked are 8 and 9, then 5-7, and as a last option you may approximate model to only one value - first.

184

answered Oct 29 '22 14:10

Eugene Lisitsky

Related questions
                            
                                How to convert unicode text to normal text
                            
                                What's the cause of this UnicodeDecodeError with an nvarchar field using pyodbc and MSSQL?
                            
                                Sqlite3 Python: How to do an efficient bulk update?
                            
                                PIL and pillow. ImportError: cannot import name '_imaging'
                            
                                How to simplify sqrt expressions in sympy
                            
                                Error when installing psycopg2 on Windows 10
                            
                                Can I pip install a cython module and make its pxds available for cimport?
                            
                                How to implement EAV in Django
                            
                                Warn for every (nested) function with free variables (recursively)
                            
                                Tensorflow import error on Pycharm (Mac)
                            
                                Change model representation in Flask-Admin without modifying model
                            
                                Method regex.scanner() cannot be found in the Python 3.5.1 documentation, but the Interpreter works well
                            
                                In python 3.5, how do I compare a string variable with part of another string? [duplicate]
                            
                                Commands working on windows command line but not in Git Bash terminal
                            
                                Python not getting raw binary from subprocess.check_call
                            
                                Python serial - Attempting to use a port that is not open
                            
                                Pyinstaller Error - "setuptools distribution was not found"
                            
                                Something like __pycache__ for Python 2.x?
                            
                                2-D Matrix: Finding and deleting columns that are subsets of other columns
                            
                                Scrapy spider that only crawls URLs once

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to interpret Singular Value Decomposition results (Python 3)?

Tags:

python

machine-learning

linear-algebra

pca

svd

O.rka

People also ask

1 Answers

Eugene Lisitsky

Recent Activity

Donate For Us