I'm trying to implement sentence similarity architecture based on this work using the STS dataset. Labels are normalized similarity scores from 0 to 1 so it is assumed to be a regression model.
My problem is that the loss goes directly to NaN starting from the first epoch. What am I doing wrong?
I have already tried updating to latest keras and theano versions.
The code for my model is:
def create_lstm_nn(input_dim):
    seq = Sequential()`
    # embedd using pretrained 300d embedding
    seq.add(Embedding(vocab_size, emb_dim, mask_zero=True, weights=[embedding_weights]))
    # encode via LSTM
    seq.add(LSTM(128))
    seq.add(Dropout(0.3))
    return seq
lstm_nn = create_lstm_nn(input_dim)
input_a = Input(shape=(input_dim,))
input_b = Input(shape=(input_dim,))
processed_a = lstm_nn(input_a)
processed_b = lstm_nn(input_b)
cos_distance = merge([processed_a, processed_b], mode='cos', dot_axes=1)
cos_distance = Reshape((1,))(cos_distance)
distance = Lambda(lambda x: 1-x)(cos_distance)
model = Model(input=[input_a, input_b], output=distance)
# train
rms = RMSprop()
model.compile(loss='mse', optimizer=rms)
model.fit([X1, X2], y, validation_split=0.3, batch_size=128, nb_epoch=20)
I also tried using a simple Lambda instead of the Merge layer, but it has the same result.
def cosine_distance(vests):
    x, y = vests
    x = K.l2_normalize(x, axis=-1)
    y = K.l2_normalize(y, axis=-1)
    return -K.mean(x * y, axis=-1, keepdims=True)
def cos_dist_output_shape(shapes):
    shape1, shape2 = shapes
    return (shape1[0],1)
distance = Lambda(cosine_distance, output_shape=cos_dist_output_shape)([processed_a, processed_b])
                The nan is a common issue in deep learning regression. Because you are using Siamese network, you can try followings:
It is not easy to make deep learning work perfectly.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With