I am implementing a custom loss function in keras. The model is an <code>autoencoder</code>. The first layer is an Embedding layer, which embed an input of size <code>(batch_size, sentence_length)</code> into <code>(batch_size, sentence_length, embedding_dimension)</code>. Then the model compresses the embedding into a vector of a certain dimension, and finaly must reconstruct the embedding <code>(batch_size, sentence_lenght, embedding_dimension)</code>. But the embedding layer is trainable, and the loss must use the weights of the embedding layer (I have to sum over all word embeddings of my vocabulary). For exemple, if I want to train on the toy exemple : "the cat". The <code>sentence_length is 2</code> and suppose <code>embedding_dimension is 10</code> and the <code>vocabulary size is 50</code>, so the embedding matrix has shape <code>(50,10)</code>. The Embedding layer's output <code>X</code> is of shape <code>(1,2,10)</code>. Then it passes in the model and the output <code>X_hat</code>, is also of shape <code>(1,2,10)</code>. The model must be trained to maximize the probability that the vector <code>X_hat[0]</code> representing 'the' is the most similar to the vector <code>X[0]</code> representing 'the' in the Embedding layer, and same thing for 'cat'. But the loss is such that I have to compute the cosine similarity between <code>X</code> and <code>X_hat</code>, normalized by the sum of cosine similarity of <code>X_hat</code> and every embedding (50, since the vocabulary size is 50) in the embedding matrix, which are the columns of the weights of the embedding layer. But How can I access the weights in the embedding layer at each iteration of the training process? Thank you !

It seems a bit crazy but it seems to work : instead of creating a custom loss function that I would pass in model.compile, the network computes the loss (Eq. 1 from arxiv.org/pdf/1708.04729.pdf) in a function that I call with Lambda : <pre class="prettyprint"><code>loss = Lambda(lambda x: similarity(x[0], x[1], x[2]))([X_hat, X, embedding_matrix]) </code></pre> And the network has two outputs: <code>X_hat</code> and <code>loss</code>, but I weight <code>X_hat</code> to have 0 weight and loss to have all the weight : <pre class="prettyprint"><code>model = Model(input_sequence, [X_hat, loss]) model.compile(loss=mean_squared_error, optimizer=optimizer, loss_weights=[0., 1.]) </code></pre> When I train the model : <pre class="prettyprint"><code>for i in range(epochs): for j in range(num_data): input_embedding = model.layers[1].get_weights()[0][[data[j:j+1]]] y = [input_embedding, 0] #The embedding of the input model.fit(data[j:j+1], y, batch_size=1, ...) </code></pre> That way, the model is trained to tend <code>loss</code> toward 0, and when I want to use the trained model's prediction I use the first output which is the reconstruction <code>X_hat</code>

Keras : How to use weights of a layer in loss function?

Tags:

keras

tensor

embedding

loss

I am implementing a custom loss function in keras. The model is an autoencoder. The first layer is an Embedding layer, which embed an input of size (batch_size, sentence_length) into (batch_size, sentence_length, embedding_dimension). Then the model compresses the embedding into a vector of a certain dimension, and finaly must reconstruct the embedding (batch_size, sentence_lenght, embedding_dimension).

But the embedding layer is trainable, and the loss must use the weights of the embedding layer (I have to sum over all word embeddings of my vocabulary).

For exemple, if I want to train on the toy exemple : "the cat". The sentence_length is 2 and suppose embedding_dimension is 10 and the vocabulary size is 50, so the embedding matrix has shape (50,10). The Embedding layer's output X is of shape (1,2,10). Then it passes in the model and the output X_hat, is also of shape (1,2,10). The model must be trained to maximize the probability that the vector X_hat[0] representing 'the' is the most similar to the vector X[0] representing 'the' in the Embedding layer, and same thing for 'cat'. But the loss is such that I have to compute the cosine similarity between X and X_hat, normalized by the sum of cosine similarity of X_hat and every embedding (50, since the vocabulary size is 50) in the embedding matrix, which are the columns of the weights of the embedding layer.

But How can I access the weights in the embedding layer at each iteration of the training process?

Thank you !

891

asked Nov 16 '17 18:11

mat112

1 Answers

It seems a bit crazy but it seems to work : instead of creating a custom loss function that I would pass in model.compile, the network computes the loss (Eq. 1 from arxiv.org/pdf/1708.04729.pdf) in a function that I call with Lambda :

loss = Lambda(lambda x: similarity(x[0], x[1], x[2]))([X_hat, X, embedding_matrix])

And the network has two outputs: X_hat and loss, but I weight X_hat to have 0 weight and loss to have all the weight :

model = Model(input_sequence, [X_hat, loss])
model.compile(loss=mean_squared_error,
              optimizer=optimizer,
              loss_weights=[0., 1.])

When I train the model :

for i in range(epochs):
    for j in range(num_data):
        input_embedding = model.layers[1].get_weights()[0][[data[j:j+1]]]
        y = [input_embedding, 0] #The embedding of the input
        model.fit(data[j:j+1], y, batch_size=1, ...)

That way, the model is trained to tend loss toward 0, and when I want to use the trained model's prediction I use the first output which is the reconstruction X_hat

answered Oct 07 '22 19:10

mat112

Related questions
                            
                                Recurrent neural layers in Keras
                            
                                Is this image too complex for a shallow NN classifier?
                            
                                How do I mask the padding in a BLSTM in Keras?
                            
                                How can you load all batch data into GPU memory in Keras (Theano backend)?
                            
                                Fully convolutional autoencoder for variable-sized images in keras
                            
                                Keras dependencies needed for prediction only in AWS Lambda
                            
                                tensorflow theano.tensor.set_subtensor equivalent
                            
                                Hacky way to augment multichannel images in Keras
                            
                                keras: issue using ImageDataGenerator and KFold for fit_generator
                            
                                keras val very slow when use model.fit_generator
                            
                                Keras LSTM autoencoder with embedding layer
                            
                                Keras best Image Data Generator parameters for data augmentation
                            
                                How exactly does Keras take dimension argumentsfor LSTM / time series problems?
                            
                                How to extract relevant information from receipt

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Keras : How to use weights of a layer in loss function?

Tags:

keras

tensor

embedding

loss

mat112

People also ask

1 Answers

mat112

Recent Activity

Donate For Us