I need to train a Bidirectional LSTM model to recognize discrete speech (individual numbers from 0 to 9) I have recorded speech from 100 speakers. What should I do next? (Suppose I am splitting them into individual .wav files containing one number per file) I will be using mfcc as features for the network.
Further, I would like to know the difference in the dataset if I am going to use a library that support CTC (Connectionist Temporal Classification)
You can use the answer/guidance provided here
Depending on what library you are using to create your LSTM(pybrain, theano, keras), you can look through their documentation.
I would recommend using Theano(Binary LSTM link) or Keras(Tutorial) for this because they are fairly simple to understand and are well documented.
hope this helps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With