Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to prepare a dataset for speech recognition

I need to train a Bidirectional LSTM model to recognize discrete speech (individual numbers from 0 to 9) I have recorded speech from 100 speakers. What should I do next? (Suppose I am splitting them into individual .wav files containing one number per file) I will be using mfcc as features for the network.

Further, I would like to know the difference in the dataset if I am going to use a library that support CTC (Connectionist Temporal Classification)

like image 764
udani Avatar asked Sep 26 '22 20:09

udani


1 Answers

You can use the answer/guidance provided here

Depending on what library you are using to create your LSTM(pybrain, theano, keras), you can look through their documentation.

I would recommend using Theano(Binary LSTM link) or Keras(Tutorial) for this because they are fairly simple to understand and are well documented.

hope this helps.

like image 155
Nirbhay Tandon Avatar answered Oct 11 '22 05:10

Nirbhay Tandon