I'm learning neural networks and trying to create speaker recognition system with tensorflow. I wanted to know how utterance length affect neural network. For example I have 1000 different sound recordings with the same lengths and 1000 different sound recordings with different lenghts. So how theoretically will work neural network with these kind of datas? Will neural network with database of same length recordings will do better or worse? Why?
I assume your question can be reformulated as How a neural network can process audio of different length?
The trick is that the signal of an arbitrary size is converted into a sequence of fixed-size feature vectors. See my answers here and here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With