The sequence_Loss
module's source_code has three parameters that are required they list them as outputs, targets, and weights.
Outputs and targets are self explanatory, but I'm looking to better understand is what is the weight
parameter?
The other thing I find confusing is that it states that the targets
should be the same length
as the outputs, what exactly do they mean by the length of a tensor? Especially if its a 3 dimensional tensor.
Think of the weights as a mask applied to the input tensor. In some NLP applications, we often have different sentence length for each sentence. In order to parallel/batch multiple instance sentences into a minibatch to feed into a neural net, people use a mask matrix to denotes which element in the the input tensor is actually a valid input. For instance, the weight can be a np.ones([batch, max_length])
that means all of the input elements are legit.
We can also use a matrix of the same shape as the labels such as np.asarray([[1,1,1,0],[1,1,0,0],[1,1,1,1]])
(we assume the labels shape is 3x4), then the crossEntropy of the first row last column will be masked out as 0.
You can also use weight to calculate weighted accumulation of cross entropy.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With