Given MP3, is it possible to break out different instruments using Fast Fourier transform (FFT)?

Tags:

I am working on a music visualizer and I'd like to display a different visual element for each instrument. For example, blue bar representing vocal, red bar representing guitar, yellow bar representing drums, etc.

Is there a way to analyze the results of FFT to get this information?

Thanks.

356

asked Aug 09 '11 10:08

user377782

2 Answers

This is a challenge that's an active area of research in music technology.

It's possible, to an extent, but it's certainly not easy. It will be especially difficult using mp3 as a lot of important information is lost in compression.

What you're trying to do is known as Audio Source Separation, or Sound Source Separation. It pursues the separation of an audio recording into its constituent elements.

These elements could be speech (several people talking at the same time - the 'cocktail party problem') or instruments (separating one instrument from another in a recording 'blind demixing').

There's various approaches you could take, some of these are based on the frequency domain characteristics of sound and others are based on spatial properties.

The frequency domain approach might appear fairly straightforward if you're trying to separate a bass drum and a flute (i.e. the low frequency bins of your FFT would be the bass drum and the higher frequency bins assigned to the flute) however in reality sounds are rarely neatly segregated into useful frequency regions. The bass drum for example will have harmonic content right the way up the frequency spectrum. These types of solutions are hence very mathematically complicated and often involves statistical modeling. Heavy stuff.

Separation based on spatial properties of sound often relies on some prior knowledge of where each source was before recording (this is 'non-blind'). It's often necessary to have more than one microphone (stereo recording at least). Using some clever maths it's possible to approach separating the sources based on a knowledge of where the source is in space, based on the relationship of the signals at each microphone. This is also the basis for a technique called beamforming, by which the position of a source can be determined using an array of microphones.

So, back on track. People are trying to do it, but it's complicated, and using mp3 will make your life difficult!

I'm afraid I don't really know enough to explain the approaches better, but I can find a few references to get you started:

http://www.cs.tut.fi/~tuomasv/demopage.html

http://www.cs.northwestern.edu/~pardo/courses/eecs352/lectures/source%20separation.pdf (pdf warning!)

Good luck!

131

answered Nov 02 '22 19:11

Speedy

For the vocal and bass you can use the fact that they are usually in the center of the stereo mix, which means it will have the exact same waveform in the left and right channel. If you subtract one channel from the other you will end up with a new channel that often will be without vocal and bass.

Something like:

sound = LoadMP3(...)
length = sound.SampleCount
left = sound.Channels[LEFT]
right = sound.Channels[RIGHT]
for i = 0:length
    difference[i] = left[i] - right[i]

Now you can look at clever ways to visualize FFT(left), FFT(right) and FFT(difference).

Maybe this will take a small step towards the effect that you are after?

answered Nov 02 '22 18:11

Hallgrim

Related questions
                            
                                Changing speed of a sound file
                            
                                Is there a way to play mp3s in Qt 4.5?
                            
                                How does the Ableton warp algorithm work exactly? [closed]
                            
                                PlaySound in C++ Console application?
                            
                                Plot spectogram from mp3
                            
                                Android stop recording fail
                            
                                Play Back audio from mic in real Time
                            
                                How to obtain sound envelope using python
                            
                                Determining Bit-Depth of a wav file
                            
                                Managed access to microphone input and system volume
                            
                                Is there any pure java way to convert .wav to .mp3?
                            
                                Audio spectrum analysis using FFT algorithm in Java
                            
                                Sound synthesis with C#
                            
                                Setting property volume of HTML5 audio with jQuery not working
                            
                                Set volume to max in Android [duplicate]
                            
                                How do I code my WinForms application play a system sound?
                            
                                How do you open an audio file in mpv using the terminal without the album art opening along with it?
                            
                                How can I get the title of the currently playing media in windows 10 with python
                            
                                How do i get a .wav sound to play?
                            
                                Extracting sound spectrum data in WP7

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Given MP3, is it possible to break out different instruments using Fast Fourier transform (FFT)?

Tags:

signal-processing

audio

fft

user377782

People also ask

2 Answers

Speedy

Hallgrim

Recent Activity

Donate For Us