I'm experimenting with tensorflow for speech recognition.
I have inputs as waveforms and words as output.
The waveform would look like this
[0,0,0,-2,3,-4,-1,7,0,0,0...0,0,0,20,-11,4,0,0,1,...]
The words would be an array of numbers while each number represents a word:
[12,4,2,3]
After training I also want to find out the correlation between input and output for each output label.
For example I want to know which input neurons | samples are responsible for the first label (here 12).
[0,0.01,0.10,0.99,0.77,0.89,0.99,0.79,0.22,0.11,0...0,0,0,0,0,0,0,0,0,...]
The original values of the input would be replaced with the correlation while 0 means no correlation and 1 means total correlation.
The goal is to get the position when a word starts.
Is there a function in tensorflow to get this correlation?
I have a sequence of data (X) that I want to translate into another sequence of data (Y) as well as report what part of (X) contributed to (Y).
This is a well known problem and Tensorflow.org actually has a fantastic example neural machine translation with attention
The example code show how to translate X (Spanish) into Y (English) and report what part of X contributes to the decision of each part of Y (attention)
The exact same principle and code can be used to translate X (wave data) into Y (words) and report what part of the wave data contributes to each word via the attention readout.
The attention layer in the example is called attention_layer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With