partial tucker decomposition

Tags:

I want to apply a partial tucker decomposition algorithm to minimize MNIST image tensor dataset of (60000,28,28), in order to conserve its features when applying another machine algorithm afterwards like SVM. I have this code that minimizes the second and third dimension of the tensor

i = 16
j = 10
core, factors = partial_tucker(train_data_mnist, modes=[1,2],tol=10e-5, rank=[i,j])
train_datapartial_tucker = tl.tenalg.multi_mode_dot(train_data_mnist, factors, 
                              modes=modes, transpose=True)
test_data_partial_tucker = tl.tenalg.multi_mode_dot(test_data_mnist, factors, 
                              modes=modes, transpose=True)

How to find the best rank [i,j] when I'm using partial_tucker in tensorly that will give the best dimension reduction for the image while conserving as much data?

458

asked Dec 23 '21 20:12

user1655410

Video Answer

2 Answers

Just like principal component analysis the partial tucker decomposition will give better results as we increase the rank, in the sense that the optimal mean square residual of the reconstruction is smaller.

In general, features (the core tensor) that enables accurate reconstructions of the original data, can be used to make similar predictions (given any model we can prepend a transformation that reconstruct the original data from the core features).

import mxnet as mx
import numpy as np
import tensorly as tl
import matplotlib.pyplot as plt
import tensorly.decomposition

# Load data
mnist = mx.test_utils.get_mnist()
train_data = mnist['train_data'][:,0]


err = np.zeros([28,28]) # here I will save the errors for each rank
batch = train_data[::100] # process only 1% of the data to go faster
for i in range(1,28):
  for j in range(1,28):
    if err[i,j] == 0:
      # Decompose the data
      core, factors = tl.decomposition.partial_tucker(
                        batch, modes=[1,2], tol=10e-5, rank=[i,j])
      # Reconstruct data from features
      c = tl.tenalg.multi_mode_dot(core, factors, modes=[1,2]);
      # Calculate the RMS error and save
      err[i,j] = np.sqrt(np.mean((c - batch)**2));

# Plot the statistics
plt.figure(figsize=(9,6))
CS = plt.contour(np.log2(err), levels=np.arange(-6, 0));
plt.clabel(CS, CS.levels, inline=True, fmt='$2^{%d}$', fontsize=16)
plt.xlabel('rank 2')
plt.ylabel('rank 1')
plt.grid()
plt.title('Reconstruction RMS error');

enter image description here

Usually you have better result with a balanced rank, i.e. i and j not very different from each other.

As we increase the error we can get better compression, we can rank the (i,j) by the error, and plot only where the error is minimum for a given feature dimension i * j, like this

X = np.zeros([28, 28])
X[...] = np.nan;
p = 28 * 28;
for e,i,j in sorted([(err[i,j], i, j) for i in range(1, 28) for j in range(1, 28)]):
  if p < i * j:
    # we can achieve this error with some better compression
    pass
  else:
    p = i * j;
    X[i,j] = e;
plt.imshow(X)

enter image description here

Anywhere in the white region you are wasting resources, the choice

answered Oct 19 '22 22:10

Bob

So if you look at the source code for tensorly linked here you can see that the documentation for the function in question partial_tucker says:

"""
Partial tucker decomposition via Higher Order Orthogonal Iteration (HOI)
Decomposes 'tensor' into a Tucker decomposition exclusively along 
the provided modes.

Parameters
----------
tensor: ndarray
modes: int list
list of the modes on which to perform the decomposition
rank: None, int or int list
size of the core tensor, 
if int, the same rank is used for all modes
"""

The purpose of this function is to provide you with the approximation which conserves as much data as possible for a given rank. I can not give you which rank "will give the best dimension reduction for the image while conserving as much data," because the optimal tradeoff between dimensionality reduction and loss of precision is something which has no objectively "correct," answer in the abstract, as it will largely depend upon the specific goals of your project and the computational resources available to you to achieve those goals.

If I told you to do "the best rank" it would eliminate the purpose of this approximate decomposition in the first place, because "the best rank" will be the rank which gives no "loss," which is no longer an approximation of fixed rank and sort of renders the term approximation meaningless. But how far to stray from this "best rank" in order to gain dimensionality reduction is not a question anyone else can answer for you objectively. One could certainly give an opinion, but this opinion would depend on much more information than I have from you at the moment. If you are looking for a more in-depth perspective on this tradeoff and what tradeoff suits you best I would suggest posting a question with some more detail on your situation on a site in the Stack network more focused on the mathematical/statistical underpinnings of dimensionality reduction and less so on the programming aspects that Stack Overflow focuses on more, such as Stack Exhange Cross Validated or perhaps Stack Exhange Data Science.

Sources/References/Further Reading:

Blog/Article on Tucker Decompositions
"Some mathematical notes on three-mode factor analysis" - Paper on Tucker Decomposition by Ledyard R Tucker
"A Multilinear Singular Value Decomposition" by Lathauwer et al
"On the Best Rank-1 and Rank-(R1 ,R2 ,. . .,RN) Approximation of Higher-Order Tensors" by Lathauwer et al

answered Oct 19 '22 21:10

Phoenix

Related questions
                            
                                Pandas group by and sum, but create a new row when a certain amount is exceeded
                            
                                Building wheel for cffi (setup.py) ... error while installing the packages from requirements.txt in django
                            
                                Looping tasks in Prefect
                            
                                pip install py-find-1st fails on ubuntu20 & centos with python3.9
                            
                                Create hierarchy column in pandas
                            
                                `multiprocessing.Process` are modifying non-shared variables they should not have access to
                            
                                Troubles with downloading and saving a document in django
                            
                                Find indices for elements in array B best matching those in array A
                            
                                How to rename the first column of a pandas dataframe?
                            
                                Pandas: reading multi-index JSON as pandas data frame
                            
                                What is the best way to get accurate text similarity in python for comparing single words or bigrams?
                            
                                Why does mypy not accept a list[str] as a list[Optional[str]]?
                            
                                Pandas Weighted Stats
                            
                                Read spark data with column that clashes with partition name
                            
                                DataFrame pairs of columns division
                            
                                How to cache in IPython Notebook?
                            
                                Allowing only single active session per user in Django app
                            
                                import-im6.q16: not authorized error 'os' @ error/constitue.c/WriteImage/1037 for a Python web scraper
                            
                                Is there any python function/library for calculate binomial confidence intervals?
                            
                                Retry on deadlock for MySQL / SQLAlchemy

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

partial tucker decomposition

Tags:

python

machine-learning

tensorflow

tensorly

user1655410

People also ask

Video Answer

2 Answers

Bob

Phoenix

Recent Activity

Donate For Us