I'm using an embedding_lookup operation in order to generate dense vector representations for each token in my document which are feed to a convolutional neural network (the network architecture is similar to the one in a WildML article).
Everything works correctly but when I pad my document inserting a padding value in it, the embedding lookup generates a vector for this token too. I think that this approach could alterate the results in the classification task. What I want to achieve is something similar to what Torch LookupTableMaskZero does.
1) Is correct what I want to do?
2) Is already implemented something like this?
3) If not, how can I mask the padding value in order to prevent the generation of the corresponding vector for it?
Thank you in advance,
Alessandro
@Alessandro Suglia I think this feature is useful, unfortunately tf not support right now. One workaround to get the same result but is slower is to lookup twice. like below
lookup_result = tf.nn.embedding_lookup(emb, index)
masked_emb = tf.concat(0, [tf.zeros([1, 1]),
tf.ones([emb.get_shape()[0] - 1, 1])
mask_lookup_result = tf.nn.embedding_lookup(masked_emb, index)
lookup_result = tf.mul(lookup_result, mask_lookup_result)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With