Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how tensorflow handles complex gradient?

Let z is a complex variable, C(z) is its conjugation. In complex analysis theory, the derivative of C(z) w.r.t z don't exist. But in tesnsorflow, we can calculate dC(z)/dz and the result is just 1. Here is an example:

x = tf.placeholder('complex64',(2,2))
y = tf.reduce_sum(tf.conj(x))
z = tf.gradients(y,x)
sess = tf.Session()
X = np.random.rand(2,2)+1.j*np.random.rand(2,2)
X = X.astype('complex64')
Z = sess.run(z,{x:X})[0]

The input X is

[[0.17014372+0.71475762j  0.57455420+0.00144318j]
 [0.57871044+0.61303568j  0.48074263+0.7623235j ]]

and the result Z is

[[1.-0.j  1.-0.j]
 [1.-0.j  1.-0.j]]

I don't understand why the gradient is set to be 1? And I want to know how tensorflow handles the complex gradients in general.

like image 248
zhd.zhang Avatar asked Feb 27 '17 06:02

zhd.zhang


People also ask

Can TensorFlow handle complex numbers?

Indeed, the two most popular Python libraries for developing deep neural networks, Pytorch and Tensorflow do not fully support the creation of complex-valued models.

How does gradient work in TensorFlow?

Gradient tapes TensorFlow "records" relevant operations executed inside the context of a tf. GradientTape onto a "tape". TensorFlow then uses that tape to compute the gradients of a "recorded" computation using reverse mode differentiation.

How does TensorFlow compute derivatives?

Tensorflow calculates derivatives using automatic differentiation. This is different from symbolic differentiation and numeric differentiation (aka finite differences). More than a smart math approach, it is a smart programming approach.

Does TensorFlow use gradient descent?

At its core, TensorFlow is just an optimized library for tensor operations (vectors, matrices, etc.) and the calculus operations used to perform gradient descent on arbitrary sequences of calculations.


1 Answers

How?

The equation used by Tensorflow for the gradient is:

tf-grad-def

Where the '*' means conjugate.

When using the definition of the partial derivatives wrt z and z* it uses Wirtinger Calculus. Wirtinger calculus enables to calculate the derivative wrt a complex variable for non-holomorphic functions. The Wirtinger definition is:

wirtinger

Why this definition?

When using for example Complex-Valued Neural Networks (CVNN) the gradients will be used over non-holomorphic, real-valued scalar function of one or several complex variables, tensorflow definition of a gradient can then be written as:

This definition corresponds with the literature of CVNN like for example chapter 4 section 4.3 of this book or Amin et al. (between countless examples).

like image 71
Agustin Barrachina Avatar answered Nov 01 '22 08:11

Agustin Barrachina