Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to feed sound as input to neural networks? [closed]

I am planning to build a software which can classify a piece of music as good or bad using artificial neural networks. For this, I need to convert audio into some numerical values to feed to NN as input. So for training the NN, I first downloaded billboard hot 100 songs (which I believe should classify as good music), and also downloaded some bad noise audio files (which will classify as bad music). Then I converted them to .wav format and then split each file into multiple .wav files of length 2 seconds each. I was planning to use fast fourier transform to convert these audio clippings to frequency - amplitude pairs, but the problem is, even if we use a 2 second clip, its FFT would generate array of about 100,000 such pairs. And doing this to thousands of audio files would generate too big dataset with too many features.
I wanted to know is there any way we could shorten this dataset, while keeping the 'essence of music' in it so that better predictions can be made? Or should I use some other algorithm/ process?

like image 901
Tarun Khare Avatar asked Feb 27 '18 12:02

Tarun Khare


People also ask

What kind of a possible neural network architecture could be used for dealing with an audio input?

Two commonly used approaches are: A CNN (Convolutional Neural Network) plus RNN-based (Recurrent Neural Network) architecture that uses the CTC Loss algorithm to demarcate each character of the words in the speech. eg. Baidu's Deep Speech model.

Which neural network is best for audio classification?

Convolutional Neural Networks (CNNs) have proven very effective in image classification and show promise for audio.

What are the inputs to a neural network?

The input layer of a neural network is composed of artificial input neurons, and brings the initial data into the system for further processing by subsequent layers of artificial neurons. The input layer is the very beginning of the workflow for the artificial neural network.

What is noise neural network?

Adding noise means that the network is less able to memorize training samples because they are changing all of the time, resulting in smaller network weights and a more robust network that has lower generalization error.


1 Answers

At first, you can extract the various audio features like:

1) Compactness.
2) Magnitude spectrum.
3) Mel-frequency cepstral coefficients.
4) Pitch.
5) Power Spectrum.
6) RMS.
7) Rhythm.
8) Spectral Centroid.
9) Spectral Flux.
10) Spectral RollOff Point.
11) Spectral Variability.
12) Zero Crossings.

After generating the feature set you have two options:

A) Aggregate the particular feature of a song by taking mean [and/or variance], concatenate the whole features for a song, then feed into the Artifical Neural Network and perform the classification task.

B) Use the Recurrent Neural Network for the classification task.

like image 177
Someone Avatar answered Oct 05 '22 10:10

Someone