How to train an lstm for speech recognition

Tags:

I'm trying to train lstm model for speech recognition but don't know what training data and target data to use. I'm using the LibriSpeech dataset and it contains both audio files and their transcripts. At this point, I know the target data will be the transcript text vectorized. As for the training data, I was thinking of using the frequencies and time from each audio file (or MFCC features). If that is the correct way to approach the problem, the training data/audio will be multiple arrays, how would I input those array into my lstm model? Will I have to vectorize them?

Thanks!

843

asked Nov 25 '16 21:11

JorgeC

1 Answers

To prepare the speech dataset for feeding into the LSTM model, you can see this post - Building Speech Dataset for LSTM binary classification and also the segment Data Preparation.

As a good example, you can see this post - http://danielhnyk.cz/predicting-sequences-vectors-keras-using-rnn-lstm/. This post talks about how to predict sequence of vectors in Keras using RNN - LSTM.

I believe you will find this post (https://stats.stackexchange.com/questions/192014/how-to-implement-a-lstm-based-classifier-to-classify-speech-files-using-keras) very helpful too.

162

answered Oct 26 '22 23:10

Wasi Ahmad

Related questions
                            
                                SSD mobilenet model does not detect objects at longer distances
                            
                                Nested tf.function is horribly slow
                            
                                AttributeError: module 'tensorflow.python.keras.backend' has no attribute 'get_graph'
                            
                                How to set dynamic memory growth on TF 2.1?
                            
                                Tensorflow not recognising cudart64_101.dll
                            
                                Epoch 1/2 103/Unknown - 8s 80ms/step - loss: 0.0175 (model.fit() keeps running forever even after crossing the total number of training images)
                            
                                How to load a keras model saved as .pb
                            
                                Define a feed_dict in c++ for Tensorflow models
                            
                                Tensorflow nn.conv3d() and max_pool3d
                            
                                Difference between Tensorflow convolution and numpy convolution
                            
                                What is colocate_with used for in tensorflow?
                            
                                Tensorflow value error: Variable already exists, disallowed
                            
                                Channels first with Keras?
                            
                                Tensorflow error : unsupported callable
                            
                                Problems with using tensorflow lite C++ API in Android Studio Project
                            
                                How to ensure tensorflow is using the GPU
                            
                                tf.keras.models.save_model and optimizer warning
                            
                                unable to build model as backend.squeeze has no layer
                            
                                tensorflow error This file requires compiler and library support for the ISO C++ 2011 standard
                            
                                How convert output tensor to one-hot tensor?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to train an lstm for speech recognition

Tags:

tensorflow

keras

lstm

speech-recognition

speech-to-text

JorgeC

People also ask

1 Answers

Wasi Ahmad

Recent Activity

Donate For Us