Audio signal source separation with neural network

Tags:

What I am trying to do is separating the audio sources and extract its pitch from the raw signal. I modeled this process myself, as represented below: model to decomposite the raw signal Each sources oscillate in normal modes, often makes its component peaks' frequency integer multiplication. It's known as Harmonic. And then resonanced, finally combined linearly.

As seen in above, I've got many hints in frequency response pattern of audio signals, but almost no idea how to 'separate' it. I've tried countless of my own models. This is one of them:

FFT the PCM
Get peak frequency bins and amplitudes.
Calculate pitch candidate frequency bins.
For each pitch candidates, using recurrent neural network analyze all the peaks and find appropriate combination of peaks.
Separate analyzed pitch candidates.

Unfortunately, I've got non of them successfully separates the signal until now. I want any of advices to solve these kind of problem. Especially in modeling of source separation like my one above.

886

asked Feb 02 '14 07:02

Laie

1 Answers

Because no one has really attempted to answer this, and because you've marked it with the neural-network tag, I'm going to address the suitability of a neural network to this kind of problem. As the question was somewhat non-technical, this answer will also be "high level".

Neural networks require some sort of sample set from which to learn. In order to "teach" a neural net to solve this problem you would essentially need to have a working set of known solutions to work from. Do you have this? If so, read on. If not, a neural is probably not what you are seeking. You stated that you have "many hints" but no real solution. This leads me to believe you probably don't have sample sets. If you can get them, great, otherwise you might be out of luck.

Supposing now that you have a sample set of Raw Signal samples and corresponding Source 1 and Source 2 outputs... Well, now you're going to need a method for deciding on a topology. Assuming you don't know a lot about how neural nets work (and don't want to), and assuming you also don't know the exact degree of complexity of the problem, I would probably recommend the open source NEAT package to get you started. I am not affiliated in any way with this project, but I have used it, and it allows you to (relatively) intelligently evolve neural network topologies to fit the problem.

Now, in terms of how a neural net would solve this specific problem. The first thing that comes to mind is that all audio signals are essentially time-series. That is to say, the information they convey is somehow dependent and related to the data at previous timesteps (e.g. the detection of some waveform cannot be done from a single time-point; it requires information about previous timesteps as well). Again, there's a million ways of solving this problem, but since I'm already recommending NEAT I'd probably suggest you take a look at the C++ NEAT Time Series mod.

If you're going down this route, you'll probably be wanting to use some sort of sliding window to provide information about the recent past at each time step. For a quick and dirty intro to sliding windows, check out this SO question:

Time Series Prediction via Neural Networks

The size of the sliding window can be important, especially if you're not using recurrent neural nets. Recurrent networks allow neural nets to remember previous time steps (at the cost of performance - NEAT is already recurrent so that choice is made for you here). You will probably want the sliding window length (ie. the number of timesteps in the past provided at every time step) to be roughly equal to your conservative guess of the largest number of previous timesteps required to gain enough information to split your waveform.

I'd say this is probably enough information to get you started.

When it comes to deciding how to provide the neural net with the data, you'll first want to normalise the input signals (consider a sigmoid function) and experiment with different transfer functions (sigmoid would probably be a good starting point).

I would imagine you'll want to have 2 output neurons, providing normalised amplitude (which you would denormalise via the inverse of the sigmoid function) as the output representing Source 1 and Source 2 respectively. For the fitness value (the way you judge the ability of each tested network to solve the problem) would be something along the lines of the negative of the RMS error of the output signal against the actual known signal (ie. tested against the samples I was referring to earlier that you will need to procure).

Suffice to say, this will not be a trivial operation, but it could work if you have enough samples to train the network against. What is a good number of samples? Well as a rule of thumb it's roughly a number that is large enough such that a simple polynomial function of order N (where N is the number of neurons in the netural network you require to solve the problem) cannot fit all of the samples accurately. This is basically to ensure you are not simply overfitting the problem, which is a serious issue with neural networks.

I hope this has been helpful! Best of luck.

Additional note: your work to date wouldn't have been in vain if you go down this route. A neural network is likely to benefit from additional "help" in the form of FFTs and other signal modelling "inputs", so you might want to consider taking the signal processing you have already done, organising into an analog, continuous representation and feeding it as an input alongside the input signal.

133

answered Oct 16 '22 18:10

quant

Related questions
                            
                                Downloading a wav file using Horseman and PhantomJS losing data quality
                            
                                AudioFileWriteBytes (AudioToolbox) fails with error code -38 kAudioFileNotOpenError
                            
                                Android Audio Loopback
                            
                                programmatically recording sound sent to Built-in Output, Mac OS X
                            
                                Speech to Text (Voice Recognition) Directly from Audio / Transcription [closed]
                            
                                mp3 audio playback not working with Cordova 3.5 on iOS
                            
                                Read data through iPhone audio jack
                            
                                Stream opus audio rtp to android device
                            
                                FFMPEG's xstack command results in out of sync sound, is it possible to mix the audio in a single encoding?
                            
                                How can you detect if an HTML5 audio tag has been listened to for at least x seconds?
                            
                                Audio glitch when playing two AVPlayer audio files simultaneously
                            
                                Plot the timeframe of each unique sound loop in a song, with rows sorted by sound similarity using python Librosa
                            
                                iPhone Detect Volume Keys press.
                            
                                Strange issue in Combining Audio Files and playing in different API versions
                            
                                How to play mp3 playlists with SoundManager using the controls
                            
                                Play 2 different audio streams on left and right speaker
                            
                                record output sound in python
                            
                                Best practice for C++ audio capture API under Linux?
                            
                                Python: midi to audio stream
                            
                                How do you determine the audio latency (AudioTrack) on Android?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Audio signal source separation with neural network

Tags:

machine-learning

neural-network

signal-processing

audio

source-separation

Laie

People also ask

1 Answers

quant

Recent Activity

Donate For Us