Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Siamese Neural Network in TensorFlow

I'm trying to implement a Siamese Neural Network in TensorFlow but I cannot really find any working example on the Internet (see Yann LeCun paper).

enter image description here

The architecture I'm trying to build would consist of two LSTMs sharing weights and only connected at the end of the network.

My question is: how to build two different neural networks sharing their weights (tied weights) in TensorFlow and how to connect them at the end?

Thanks :)

Edit: I implemented a simple and working example of a siamese network here on MNIST.

like image 821
BiBi Avatar asked Apr 25 '16 15:04

BiBi


People also ask

What neural network does TensorFlow use?

TensorFlow's high-level APIs are based on the Keras API standard for defining and training neural networks. Keras enables fast prototyping, state-of-the-art research, and production—all with user-friendly APIs.


1 Answers

Update with tf.layers

If you use the tf.layers module to build your network, you can simply use the argument reuse=True for the second part of the Siamese network:

x = tf.ones((1, 3)) y1 = tf.layers.dense(x, 4, name='h1') y2 = tf.layers.dense(x, 4, name='h1', reuse=True)  # y1 and y2 will evaluate to the same values sess = tf.Session() sess.run(tf.global_variables_initializer()) print(sess.run(y1)) print(sess.run(y2))  # both prints will return the same values 

Old answer with tf.get_variable

You can try using the function tf.get_variable(). (See the tutorial)

Implement the first network using a variable scope with reuse=False:

with tf.variable_scope('Inference', reuse=False):     weights_1 = tf.get_variable('weights', shape=[1, 1],                               initializer=...)     output_1 = weights_1 * input_1 

Then implement the second with the same code except using reuse=True

with tf.variable_scope('Inference', reuse=True):     weights_2 = tf.get_variable('weights')     output_2 = weights_2 * input_2 

The first implementation will create and initialize every variable of the LSTM, whereas the second implementation will use tf.get_variable() to get the same variables used in the first network. That way, variables will be shared.

Then you just have to use whatever loss you want (e.g. you can use the L2 distance between the two siamese networks), and the gradients will backpropagate through both networks, updating the shared variables with the sum of the gradients.

like image 171
Olivier Moindrot Avatar answered Sep 22 '22 01:09

Olivier Moindrot