Currently I am trying to train a network that has both a complex-valued tensor as input and as output. As loss function, I take the norm of the pointwise difference between the output and the ground-truth.
When I try to minimize the loss function, the 'minimize' function of tensorflow complains about unexpected complex numbers. I find this strange, since I expected tensorflow to be able to handle back-prop on complex numbers. Also, I explicitly checked that the loss-value was indeed a real-valued tensor.
The reason I got stuck is that the error occurs deep in tensorflows code and seems to be based on the types of the gradients. Here I find it hard to see what exactly happened under the hood and how these gradient calculations are supposed to happen. Can anyone help me out with figuring out how complex-networks are supposed to be trained with tensorflow?
Here is a minimal self-contained code example. It simply has a single complex-fully connected layer and contains all code up to the minimize function and below it the corresponding error message I get:
import tensorflow as tf
def do_training():
# Create placeholders for potential training-data/labels
train_data_node = tf.placeholder(tf.complex64,
shape=(25, 10),
name="train_data_node")
train_labels_node = tf.placeholder(tf.complex64,
shape=(25, 10),
name="train_labels_node")
# create and initialise the weights
weights = {
'fc_w1': tf.Variable(tf.complex( tf.random_normal([10, 10], stddev=0.01, dtype = tf.float32),
tf.random_normal([10, 10], stddev=0.01, dtype = tf.float32))),
'fc_b1': tf.Variable(tf.complex( tf.random_normal([10]), tf.random_normal([10]))),
}
prediction = model(train_data_node, weights)
loss = tf.real(tf.norm(prediction - train_labels_node))
train_op = tf.train.AdamOptimizer(learning_rate=1.0).minimize(loss)
def model(data, weights):
l1 = tf.matmul(data, weights['fc_w1']) # FC
l1 = l1 + weights['fc_b1']
return l1
And the error message:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/myFolder/training.py", line 23, in do_training
train_op = tf.train.AdamOptimizer(learning_rate=1.0).minimize(loss)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/optimizer.py", line 315, in minimize
grad_loss=grad_loss)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/optimizer.py", line 392, in compute_gradients
if g is not None and v.dtype != dtypes.resource])
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/optimizer.py", line 517, in _assert_valid_dtypes
dtype, t.name, [v for v in valid_dtypes]))
ValueError: Invalid type tf.complex64 for Variable:0, expected: [tf.float32, tf.float64, tf.float16].
edit: I tried replacing the complex weights by real-valued weights. This required casting those weights to complex values before multiplying them inside the fully-connected layer. This worked, so my current hypothesis is that tensorflow doesn't support gradient calculations on complex weights. Can anyone confirm this?
You already have your confirmation from your error. Also from the source code the function _assert_valid_dtypes uses
def _valid_dtypes(self):
"""Valid types for loss, variables and gradients.
Subclasses should override to allow other float types.
Returns:
Valid types for loss, variables and gradients.
"""
return set([dtypes.float16, dtypes.float32, dtypes.float64])
Which is exactly what the error tells you.
This is not the only place where TF can't properly work with complex values. Even calculations like tf.reduce_prod also have problems.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With