I want to apply a partial tucker decomposition algorithm to minimize MNIST image tensor dataset of (60000,28,28), in order to conserve its features when applying another machine algorithm afterwards like SVM. I have this code that minimizes the second and third dimension of the tensor
i = 16
j = 10
core, factors = partial_tucker(train_data_mnist, modes=[1,2],tol=10e-5, rank=[i,j])
train_datapartial_tucker = tl.tenalg.multi_mode_dot(train_data_mnist, factors,
modes=modes, transpose=True)
test_data_partial_tucker = tl.tenalg.multi_mode_dot(test_data_mnist, factors,
modes=modes, transpose=True)
How to find the best rank [i,j]
when I'm using partial_tucker
in tensorly that will give the best dimension reduction for the image while conserving as much data?
The Tucker decomposition (Tucker (1966)) decomposes a tensor into a core tensor multiplied by a matrix along each mode (i.e., transformed via a k -mode product for every k=1,2,…,N k = 1 , 2 , … , N ): X=G×1A(1)×2A(2)×3… ×NA(N). X = G × 1 A ( 1 ) × 2 A ( 2 ) × 3 …
TensorLy is an open-source Python library that eases the task of performing tensor operations. It provides a high-level API for dealing with deep tensorized neural networks and tensor methods. It was created in 2015 by a senior research scientist at NVIDIA Research Group named Jean Kossaifi.
In mathematics, Tucker decomposition decomposes a tensor into a set of matrices and one small core tensor. It is named after Ledyard R. Tucker although it goes back to Hitchcock in 1927.
Only valid if a Tucker tensor is provided as init. function to use to compute the SVD, acceptable values in tensorly.SVD_FUNS tolerance: the algorithm stops when the variation in the reconstruction error is less than the tolerance array of booleans with the same shape as tensor should be 0 where the values are missing and 1 everywhere else.
Tensor rank decomposition is a type of tensor decomposition also known as CPD (Canonical Polydic decomposition). CPD is the generalization of matrix SVD.
However, partial fraction decomposition ( also known as partial fraction expansion ) is precisely the reverse process of that. The following is an illustrative diagram to show the main concept. Now, I will go over five (5) examples to demonstrate the steps involved in decomposing a single fraction into parts.
Just like principal component analysis the partial tucker decomposition will give better results as we increase the rank, in the sense that the optimal mean square residual of the reconstruction is smaller.
In general, features (the core
tensor) that enables accurate reconstructions of the original data, can be used to make similar predictions (given any model we can prepend a transformation that reconstruct the original data from the core
features).
import mxnet as mx
import numpy as np
import tensorly as tl
import matplotlib.pyplot as plt
import tensorly.decomposition
# Load data
mnist = mx.test_utils.get_mnist()
train_data = mnist['train_data'][:,0]
err = np.zeros([28,28]) # here I will save the errors for each rank
batch = train_data[::100] # process only 1% of the data to go faster
for i in range(1,28):
for j in range(1,28):
if err[i,j] == 0:
# Decompose the data
core, factors = tl.decomposition.partial_tucker(
batch, modes=[1,2], tol=10e-5, rank=[i,j])
# Reconstruct data from features
c = tl.tenalg.multi_mode_dot(core, factors, modes=[1,2]);
# Calculate the RMS error and save
err[i,j] = np.sqrt(np.mean((c - batch)**2));
# Plot the statistics
plt.figure(figsize=(9,6))
CS = plt.contour(np.log2(err), levels=np.arange(-6, 0));
plt.clabel(CS, CS.levels, inline=True, fmt='$2^{%d}$', fontsize=16)
plt.xlabel('rank 2')
plt.ylabel('rank 1')
plt.grid()
plt.title('Reconstruction RMS error');
Usually you have better result with a balanced rank, i.e. i
and j
not very different from each other.
As we increase the error we can get better compression, we can rank the (i,j)
by the error, and plot only where the error is minimum for a given feature dimension i * j
, like this
X = np.zeros([28, 28])
X[...] = np.nan;
p = 28 * 28;
for e,i,j in sorted([(err[i,j], i, j) for i in range(1, 28) for j in range(1, 28)]):
if p < i * j:
# we can achieve this error with some better compression
pass
else:
p = i * j;
X[i,j] = e;
plt.imshow(X)
Anywhere in the white region you are wasting resources, the choice
So if you look at the source code for tensorly
linked here you can see that the documentation for the function in question partial_tucker
says:
"""
Partial tucker decomposition via Higher Order Orthogonal Iteration (HOI)
Decomposes 'tensor' into a Tucker decomposition exclusively along
the provided modes.
Parameters
----------
tensor: ndarray
modes: int list
list of the modes on which to perform the decomposition
rank: None, int or int list
size of the core tensor,
if int, the same rank is used for all modes
"""
The purpose of this function is to provide you with the approximation which conserves as much data as possible for a given rank. I can not give you which rank "will give the best dimension reduction for the image while conserving as much data," because the optimal tradeoff between dimensionality reduction and loss of precision is something which has no objectively "correct," answer in the abstract, as it will largely depend upon the specific goals of your project and the computational resources available to you to achieve those goals.
If I told you to do "the best rank" it would eliminate the purpose of this approximate decomposition in the first place, because "the best rank" will be the rank which gives no "loss," which is no longer an approximation of fixed rank and sort of renders the term approximation meaningless. But how far to stray from this "best rank" in order to gain dimensionality reduction is not a question anyone else can answer for you objectively. One could certainly give an opinion, but this opinion would depend on much more information than I have from you at the moment. If you are looking for a more in-depth perspective on this tradeoff and what tradeoff suits you best I would suggest posting a question with some more detail on your situation on a site in the Stack network more focused on the mathematical/statistical underpinnings of dimensionality reduction and less so on the programming aspects that Stack Overflow focuses on more, such as Stack Exhange Cross Validated or perhaps Stack Exhange Data Science.
Sources/References/Further Reading:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With