I want to solve an eigenvalue problem using TensorFlow. In particular, I have
e, v = tf.self_adjoint_eig(laplacian, name="eigendata")
eigenmap = v[:,1:4]
so I don't want to compute all eigenvectors.
In Matlab, I would use eigs(laplacian,4,'sm')
Looking at https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/linalg_ops.py,
I see that tf.self_adjoint_eig calls gen_linalg_ops._self_adjoint_eig_v2.
However, I can't find gen_linalg_ops on Github or elsewhere.
Any advice on doing such linear algebra in TensorFlow, or is it best to go with other libraries in Python?
MATLAB function EIG calculates all the eigenvectors. MATLAB function EIGS only calculates a selected number of the eigenvectors using precompiled https://en.wikipedia.org/wiki/ARPACK which implements https://en.wikipedia.org/wiki/Lanczos_algorithm There is no native MATLAB Lanczos code in MATLAB, most likely because the Lanczos algorithm is unavoidably unstable with respect to round-off errors, especially in single precision, making more stable implementations tricky and/or expensive.
An alternative to EIGS function is https://www.mathworks.com/matlabcentral/fileexchange/48-lobpcg-m that implements https://en.wikipedia.org/wiki/LOBPCG natively in MATLAB.
SciPy has an interface to ARPACK as well as the Python native implementation https://docs.scipy.org/doc/scipy-1.1.0/reference/generated/scipy.sparse.linalg.lobpcg.html
Scikit uses ARPACK or LOBPCG for manifold spectral embedding http://scikit-learn.org/stable/modules/generated/sklearn.manifold.spectral_embedding.html and for spectral clustering http://scikit-learn.org/stable/modules/generated/sklearn.cluster.SpectralClustering.html
TensorFlow now has a native implementation of Lanczos https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/solvers/python/ops/lanczos.py
Current TensorFlow linalg implementations are single core and seem to be getting written from scratch, so it may take some time to match functionality of older libraries like Intel's Matrix Kernel Library.
You could do computation in MKL (available with conda version of scipy) and pass the values data between MKL and TensorFlow as numpy arrays. Since computation scales as O(n^3) for large enough matrices, the extra cost of data transfer will be negligible compared to computation.
For instance computing full SVD in MKL (which counter-intuitively, is faster than self-adjoint eig), I use following wrapper, which keeps in tf.Variable objects, and lets me switch between MKL and TensorFlow implementations
default_dtype = tf.float32
USE_MKL_SVD=True # Tensorflow vs MKL SVD
if USE_MKL_SVD:
assert np.__config__.get_info("lapack_mkl_info"), "No MKL detected :("
class SvdWrapper:
"""Encapsulates variables needed to perform SVD of a TensorFlow target.
Initialize: wrapper = SvdWrapper(tensorflow_var)
Trigger SVD: wrapper.update_tf() or wrapper.update_scipy()
Access result as TF vars: wrapper.s, wrapper.u, wrapper.v
"""
def __init__(self, target, name):
self.name = name
self.target = target
self.tf_svd = SvdTuple(tf.svd(target))
self.init = SvdTuple(
ones(target.shape[0], name=name+"_s_init"),
Identity(target.shape[0], name=name+"_u_init"),
Identity(target.shape[0], name=name+"_v_init")
)
assert self.tf_svd.s.shape == self.init.s.shape
assert self.tf_svd.u.shape == self.init.u.shape
assert self.tf_svd.v.shape == self.init.v.shape
self.cached = SvdTuple(
tf.Variable(self.init.s, name=name+"_s"),
tf.Variable(self.init.u, name=name+"_u"),
tf.Variable(self.init.v, name=name+"_v")
)
self.s = self.cached.s
self.u = self.cached.u
self.v = self.cached.v
self.holder = SvdTuple(
tf.placeholder(default_dtype, shape=self.cached.s.shape, name=name+"_s_holder"),
tf.placeholder(default_dtype, shape=self.cached.u.shape, name=name+"_u_holder"),
tf.placeholder(default_dtype, shape=self.cached.v.shape, name=name+"_v_holder")
)
self.update_tf_op = tf.group(
self.cached.s.assign(self.tf_svd.s),
self.cached.u.assign(self.tf_svd.u),
self.cached.v.assign(self.tf_svd.v)
)
self.update_external_op = tf.group(
self.cached.s.assign(self.holder.s),
self.cached.u.assign(self.holder.u),
self.cached.v.assign(self.holder.v)
)
self.init_ops = (self.s.initializer, self.u.initializer, self.v.initializer)
def update(self):
if USE_MKL_SVD:
self.update_scipy()
else:
self.update_tf()
def update_tf(self):
sess = tf.get_default_session()
sess.run(self.update_tf_op)
def update_scipy(self):
sess = tf.get_default_session()
target0 = self.target.eval()
# A=u.diag(s).v', singular vectors are columns
# TODO: catch "ValueError: array must not contain infs or NaNs"
u0, s0, vt0 = linalg.svd(target0)
v0 = vt0.T
# v0 = vt0 # bug, makes loss increase, use for sanity checks
feed_dict = {self.holder.u: u0,
self.holder.v: v0,
self.holder.s: s0}
sess.run(self.update_external_op, feed_dict=feed_dict)
And to use
covariance = data @ t(data)
svd = u.SvdWrapper(target=covariance)
sess.run(svd.init_ops) # initialize to identity matrices
svd.update() # update using latest value of covariance
sess.run([svd.s, svd.u, svd.v]) # get values of factors
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With