Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract Treble and Bass from audio in iOS

I'm looking for a way to get the treble and bass data from a song for some incrementation of time (say 0.1 seconds) and in the range of 0.0 to 1.0. I've googled around but haven't been able to find anything remotely close to what I'm looking for. Ultimately I want to be able to represent the treble and bass level while the song is playing.

Thanks!

like image 254
Matt Sich Avatar asked Mar 16 '13 22:03

Matt Sich


1 Answers

Its reasonably easy. You need to perform an FFT and then sum up the bins that interest you. A lot of how you select will depend on the sampling rate of your audio.

You then need to choose an appropriate FFT order to get good information in the frequency bins returned.

So if you do an order 8 FFT you will need 256 samples. This will return you 128 complex pairs.

Next you need to convert these to magnitude. This is actually quite simple. if you are using std::complex you can simply perform a std::abs on the complex number and you will have its magnitude (sqrt( r^2 + i^2 )).

Interestingly at this point there is something called Parseval's theorem. This theorem states that after performinng a fourier transform the sum of the bins returned is equal to the sum of mean squares of the input signal.

This means that to get the amplitude of a specific set of bins you can simply add them together divide by the number of them and then sqrt to get the RMS amplitude value of those bins.

So where does this leave you?

Well from here you need to figure out which bins you are adding together.

  1. A treble tone is defined as above 2000Hz.
  2. A bass tone is below 300Hz (if my memory serves me correctly).
  3. Mids are between 300Hz and 2kHz.

Now suppose your sample rate is 8kHz. The Nyquist rate says that the highest frequency you can represent in 8kHz sampling is 4kHz. Each bin thus represents 4000/128 or 31.25Hz.

So if the first 10 bins (Up to 312.5Hz) are used for Bass frequencies. Bin 10 to Bin 63 represent the mids. Finally bin 64 to 127 is the trebles.

You can then calculate the RMS value as described above and you have the RMS values.

RMS values can be converted to dBFS values by performing 20.0f * log10f( rmsVal );. This will return you a value from 0dB (max amplitude) down to -infinity dB (min amplitude). Be aware amplitudes do not range from -1 to 1.

To help you along, here is a bit of my C++ based FFT class for iPhone (which uses vDSP under the hood):

MacOSFFT::MacOSFFT( unsigned int fftOrder ) :
    BaseFFT( fftOrder )
{
    mFFTSetup   = (void*)vDSP_create_fftsetup( mFFTOrder, 0 );
    mImagBuffer.resize( 1 << mFFTOrder );
    mRealBufferOut.resize( 1 << mFFTOrder );
    mImagBufferOut.resize( 1 << mFFTOrder );
}

MacOSFFT::~MacOSFFT()
{
    vDSP_destroy_fftsetup( (FFTSetup)mFFTSetup );
}

bool MacOSFFT::ForwardFFT( std::vector< std::complex< float > >& outVec, const std::vector< float >& inVec )
{
    return ForwardFFT( &outVec.front(), &inVec.front(), inVec.size() );
}

bool MacOSFFT::ForwardFFT( std::complex< float >* pOut, const float* pIn, unsigned int num )
{
    // Bring in a pre-allocated imaginary buffer that is initialised to 0.
    DSPSplitComplex dspscIn;
    dspscIn.realp = (float*)pIn;
    dspscIn.imagp = &mImagBuffer.front();

    DSPSplitComplex dspscOut;
    dspscOut.realp  = &mRealBufferOut.front();
    dspscOut.imagp  = &mImagBufferOut.front();

    vDSP_fft_zop( (FFTSetup)mFFTSetup, &dspscIn, 1, &dspscOut, 1, mFFTOrder, kFFTDirection_Forward );

    vDSP_ztoc( &dspscOut, 1, (DSPComplex*)pOut, 1, num );

     return true;
}
like image 103
Goz Avatar answered Oct 16 '22 22:10

Goz