Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Siamese network, lower part uses a dense layer instead of a euclidean distance layer

This is a rather interesting question for Siamese network

I am following the example from https://keras.io/examples/mnist_siamese/. My modified version of the code is in this google colab

The siamese network takes in 2 inputs (2 handwritten digits) and output whether they are of the same digit (1) or not (0).

Each of the two inputs are first processed by a shared base_network (3 Dense layers with 2 Dropout layers in between). The input_a is extracted into processed_a, input_b into processed_b.

The last layer of the siamese network is an euclidean distance layer between the two extracted tensors:

distance = Lambda(euclidean_distance,
                  output_shape=eucl_dist_output_shape)([processed_a, processed_b])

model = Model([input_a, input_b], distance)

I understand the reasoning behind using an euclidean distance layer for the lower part of the network: if the features are extracted nicely, then similar inputs should have similar features.

I am thinking, why not use a normal Dense layer for the lower part, as:

# distance = Lambda(euclidean_distance,
#                   output_shape=eucl_dist_output_shape)([processed_a, processed_b])

# model = Model([input_a, input_b], distance)

#my model
subtracted = Subtract()([processed_a, processed_b])
out = Dense(1, activation="sigmoid")(subtracted)
model = Model([input_a,input_b], out)

My reasoning is that if the extracted features are similar, then the Subtract layer should produce a small tensor, as the difference between the extracted features. The next layer, Dense layer, can learn that if the input is small, output 1, otherwise 0.

Because the euclidean distance layer outputs close to 0 value when two inputs are similar and 1 otherwise, I also need to invert the accuracy and loss function, as:

# the version of loss and accuracy for Euclidean distance layer
# def contrastive_loss(y_true, y_pred):
#     '''Contrastive loss from Hadsell-et-al.'06
#     http://yann.lecun.com/exdb/publis/pdf/hadsell-chopra-lecun-06.pdf
#     '''
#     margin = 1
#     square_pred = K.square(y_pred)
#     margin_square = K.square(K.maximum(margin - y_pred, 0))
#     return K.mean(y_true * square_pred + (1 - y_true) * margin_square)

# def compute_accuracy(y_true, y_pred):
#     '''Compute classification accuracy with a fixed threshold on distances.
#     '''
#     pred = y_pred.ravel() < 0.5
#     return np.mean(pred == y_true)

# def accuracy(y_true, y_pred):
#     '''Compute classification accuracy with a fixed threshold on distances.
#     '''
#     return K.mean(K.equal(y_true, K.cast(y_pred < 0.5, y_true.dtype)))

### my version, loss and accuracy
def contrastive_loss(y_true, y_pred):
  margin = 1
  square_pred = K.square(y_pred)
  margin_square = K.square(K.maximum(margin - y_pred, 0))
#   return K.mean(y_true * square_pred + (1-y_true) * margin_square)
  return K.mean(y_true * margin_square + (1-y_true) * square_pred)

def compute_accuracy(y_true, y_pred):
  '''Compute classification accuracy with a fixed threshold on distances.
  '''
  pred = y_pred.ravel() > 0.5
  return np.mean(pred == y_true)

def accuracy(y_true, y_pred):
  '''Compute classification accuracy with a fixed threshold on distances.
  '''
  return K.mean(K.equal(y_true, K.cast(y_pred > 0.5, y_true.dtype)))

The accuracy for the old model: * Accuracy on training set: 99.55% * Accuracy on test set: 97.42% This slight change leads to a model that not learning anything: * Accuracy on training set: 48.64% * Accuracy on test set: 48.29%

So my question is:

1. What is wrong with my reasoning of using Substract + Dense for the lower part of the Siamese network?

2. Can we fix this? I have two potential solution in mind but I am not confident, (1) convoluted neural net for feature extraction (2) more dense layers for the lower part of the siamese network.

like image 969
bookmonkie Avatar asked Oct 27 '25 03:10

bookmonkie


1 Answers

In case of two similar examples, after subtracting two n-dimensional feature vector (extracted using common/base feature extraction model) you will get zero or around zero value in most of the location of resulting n-dimensional vector on which next/output Dense layer works. On the other hand, we all know that in a ANN model weights are learnt in such a way that less important features produce very less responses and prominent/interesting features contributing towards decision produce high responses. Now you can understand that our subtracted features vector is just in the opposite direction because when two examples are from different class then they produce high responses and opposite for examples from same class. Furthermore with a single node in the output layer (no additional hidden layer before output layer) its quite difficult to learn for model to generate high response from zero values when two samples are of same class. This might be an important point to solve your problem.

Based on the above discussion, you may want to try following ideas:

  • transforming subtracted feature vector to ensure when there is similarity you get high responses, may be by doing subtraction from 1 or reciprocal (multiplicative inverse) followed by normalization.
  • Adding more Dense layer before output layer.

I wont be surprised if convolutional neural net instead of stacked Dense layer for feature extraction (as you are thinking) does not improve your accuracy much as it's just another way of doing the same (feature extraction).

like image 182
Kaushik Roy Avatar answered Oct 28 '25 16:10

Kaushik Roy



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!