Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

partial tucker decomposition

I want to apply a partial tucker decomposition algorithm to minimize MNIST image tensor dataset of (60000,28,28), in order to conserve its features when applying another machine algorithm afterwards like SVM. I have this code that minimizes the second and third dimension of the tensor

i = 16
j = 10
core, factors = partial_tucker(train_data_mnist, modes=[1,2],tol=10e-5, rank=[i,j])
train_datapartial_tucker = tl.tenalg.multi_mode_dot(train_data_mnist, factors, 
                              modes=modes, transpose=True)
test_data_partial_tucker = tl.tenalg.multi_mode_dot(test_data_mnist, factors, 
                              modes=modes, transpose=True)

How to find the best rank [i,j] when I'm using partial_tucker in tensorly that will give the best dimension reduction for the image while conserving as much data?

like image 458
user1655410 Avatar asked Dec 23 '21 20:12

user1655410


People also ask

How do you calculate Tucker decomposition?

The Tucker decomposition (Tucker (1966)) decomposes a tensor into a core tensor multiplied by a matrix along each mode (i.e., transformed via a k -mode product for every k=1,2,…,N k = 1 , 2 , … , N ): X=G×1A(1)×2A(2)×3… ×NA(N). X = G × 1 A ( 1 ) × 2 A ( 2 ) × 3 …

What is TensorLy?

TensorLy is an open-source Python library that eases the task of performing tensor operations. It provides a high-level API for dealing with deep tensorized neural networks and tensor methods. It was created in 2015 by a senior research scientist at NVIDIA Research Group named Jean Kossaifi.

What is Tucker decomposition?

In mathematics, Tucker decomposition decomposes a tensor into a set of matrices and one small core tensor. It is named after Ledyard R. Tucker although it goes back to Hitchcock in 1927.

Is Tucker tensor valid for SVD?

Only valid if a Tucker tensor is provided as init. function to use to compute the SVD, acceptable values in tensorly.SVD_FUNS tolerance: the algorithm stops when the variation in the reconstruction error is less than the tolerance array of booleans with the same shape as tensor should be 0 where the values are missing and 1 everywhere else.

What is a tensor rank decomposition?

Tensor rank decomposition is a type of tensor decomposition also known as CPD (Canonical Polydic decomposition). CPD is the generalization of matrix SVD.

What is partial fraction decomposition?

However, partial fraction decomposition ( also known as partial fraction expansion ) is precisely the reverse process of that. The following is an illustrative diagram to show the main concept. Now, I will go over five (5) examples to demonstrate the steps involved in decomposing a single fraction into parts.


Video Answer


2 Answers

Just like principal component analysis the partial tucker decomposition will give better results as we increase the rank, in the sense that the optimal mean square residual of the reconstruction is smaller.

In general, features (the core tensor) that enables accurate reconstructions of the original data, can be used to make similar predictions (given any model we can prepend a transformation that reconstruct the original data from the core features).

import mxnet as mx
import numpy as np
import tensorly as tl
import matplotlib.pyplot as plt
import tensorly.decomposition

# Load data
mnist = mx.test_utils.get_mnist()
train_data = mnist['train_data'][:,0]


err = np.zeros([28,28]) # here I will save the errors for each rank
batch = train_data[::100] # process only 1% of the data to go faster
for i in range(1,28):
  for j in range(1,28):
    if err[i,j] == 0:
      # Decompose the data
      core, factors = tl.decomposition.partial_tucker(
                        batch, modes=[1,2], tol=10e-5, rank=[i,j])
      # Reconstruct data from features
      c = tl.tenalg.multi_mode_dot(core, factors, modes=[1,2]);
      # Calculate the RMS error and save
      err[i,j] = np.sqrt(np.mean((c - batch)**2));

# Plot the statistics
plt.figure(figsize=(9,6))
CS = plt.contour(np.log2(err), levels=np.arange(-6, 0));
plt.clabel(CS, CS.levels, inline=True, fmt='$2^{%d}$', fontsize=16)
plt.xlabel('rank 2')
plt.ylabel('rank 1')
plt.grid()
plt.title('Reconstruction RMS error');

enter image description here

Usually you have better result with a balanced rank, i.e. i and j not very different from each other.

As we increase the error we can get better compression, we can rank the (i,j) by the error, and plot only where the error is minimum for a given feature dimension i * j, like this

X = np.zeros([28, 28])
X[...] = np.nan;
p = 28 * 28;
for e,i,j in sorted([(err[i,j], i, j) for i in range(1, 28) for j in range(1, 28)]):
  if p < i * j:
    # we can achieve this error with some better compression
    pass
  else:
    p = i * j;
    X[i,j] = e;
plt.imshow(X)

enter image description here

Anywhere in the white region you are wasting resources, the choice

like image 72
Bob Avatar answered Oct 19 '22 22:10

Bob


So if you look at the source code for tensorly linked here you can see that the documentation for the function in question partial_tucker says:

"""
Partial tucker decomposition via Higher Order Orthogonal Iteration (HOI)
Decomposes 'tensor' into a Tucker decomposition exclusively along 
the provided modes.

Parameters
----------
tensor: ndarray
modes: int list
list of the modes on which to perform the decomposition
rank: None, int or int list
size of the core tensor, 
if int, the same rank is used for all modes
"""

The purpose of this function is to provide you with the approximation which conserves as much data as possible for a given rank. I can not give you which rank "will give the best dimension reduction for the image while conserving as much data," because the optimal tradeoff between dimensionality reduction and loss of precision is something which has no objectively "correct," answer in the abstract, as it will largely depend upon the specific goals of your project and the computational resources available to you to achieve those goals.

If I told you to do "the best rank" it would eliminate the purpose of this approximate decomposition in the first place, because "the best rank" will be the rank which gives no "loss," which is no longer an approximation of fixed rank and sort of renders the term approximation meaningless. But how far to stray from this "best rank" in order to gain dimensionality reduction is not a question anyone else can answer for you objectively. One could certainly give an opinion, but this opinion would depend on much more information than I have from you at the moment. If you are looking for a more in-depth perspective on this tradeoff and what tradeoff suits you best I would suggest posting a question with some more detail on your situation on a site in the Stack network more focused on the mathematical/statistical underpinnings of dimensionality reduction and less so on the programming aspects that Stack Overflow focuses on more, such as Stack Exhange Cross Validated or perhaps Stack Exhange Data Science.

Sources/References/Further Reading:

  1. Blog/Article on Tucker Decompositions
  2. "Some mathematical notes on three-mode factor analysis" - Paper on Tucker Decomposition by Ledyard R Tucker
  3. "A Multilinear Singular Value Decomposition" by Lathauwer et al
  4. "On the Best Rank-1 and Rank-(R1 ,R2 ,. . .,RN) Approximation of Higher-Order Tensors" by Lathauwer et al
like image 30
Phoenix Avatar answered Oct 19 '22 21:10

Phoenix