Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Embedding lookup table doesn't mask padding value

I'm using an embedding_lookup operation in order to generate dense vector representations for each token in my document which are feed to a convolutional neural network (the network architecture is similar to the one in a WildML article).

Everything works correctly but when I pad my document inserting a padding value in it, the embedding lookup generates a vector for this token too. I think that this approach could alterate the results in the classification task. What I want to achieve is something similar to what Torch LookupTableMaskZero does.

1) Is correct what I want to do?

2) Is already implemented something like this?

3) If not, how can I mask the padding value in order to prevent the generation of the corresponding vector for it?

Thank you in advance,

Alessandro

like image 713
Alessandro Suglia Avatar asked May 16 '16 13:05

Alessandro Suglia


1 Answers

@Alessandro Suglia I think this feature is useful, unfortunately tf not support right now. One workaround to get the same result but is slower is to lookup twice. like below

  lookup_result = tf.nn.embedding_lookup(emb, index)
  masked_emb = tf.concat(0, [tf.zeros([1, 1]), 
                             tf.ones([emb.get_shape()[0] - 1, 1])
  mask_lookup_result = tf.nn.embedding_lookup(masked_emb, index)
  lookup_result = tf.mul(lookup_result, mask_lookup_result)
like image 200
allen Avatar answered Nov 11 '22 09:11

allen