Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bit-shifting audio samples from Float32 to SInt16 results in severe clipping

I'm new to the iOS and its C underpinnings, but not to programming in general. My dilemma is this. I'm implementing an echo effect in a complex AudioUnits based application. The application needs reverb, echo, and compression, among other things. However, the echo only works right when I use a particular AudioStreamBasicDescription format for the audio samples generated in my app. This format however doesn't work with the other AudioUnits. While there are other ways to solve this problem fixing the bit-twiddling in the echo algorithm might be the most straight forward approach.

The*AudioStreamBasicDescription* that works with echo has a mFormatFlag of: kAudioFormatFlagsAudioUnitCanonical; It's specifics are:

AudioUnit Stream Format (ECHO works, NO AUDIO UNITS)
Sample Rate:              44100
Format ID:                 lpcm
Format Flags:              3116 = kAudioFormatFlagsAudioUnitCanonical
Bytes per Packet:             4
Frames per Packet:            1
Bytes per Frame:              4
Channels per Frame:           2
Bits per Channel:            32
Set ASBD on input
Set ASBD on  output
au SampleRate rate: 0.000000, 2 channels, 12 formatflags, 1819304813 mFormatID, 16 bits per channel

The stream format that works with AudioUnits is the same except for the mFormatFlag: kAudioFormatFlagIsFloat | kAudioFormatFlagsNativeEndian | kAudioFormatFlagIsPacked | kAudioFormatFlagIsNonInterleaved -- Its specifics are:

AudioUnit Stream Format (NO ECHO, AUDIO UNITS WORK)
Sample Rate:              44100
Format ID:                 lpcm
Format Flags:                41 
Bytes per Packet:             4
Frames per Packet:            1
Bytes per Frame:              4
Channels per Frame:           2
Bits per Channel:            32
Set ASBD on input
Set ASBD on  output
au SampleRate rate: 44100.000000, 2 channels, 41 formatflags, 1819304813 mFormatID, 32 bits per channel

In order to create the echo effect I use two functions that bit-shift sample data into SInt16 space, and back. As I said, this works for the kAudioFormatFlagsAudioUnitCanonical, format but not the other. When it fails, the sounds are clipped and distorted, but they are there. I think this indicates that the difference between these two formats is how the data is arranged in the Float32.

// convert sample vector from fixed point 8.24 to SInt16
void fixedPointToSInt16( SInt32 * source, SInt16 * target, int length ) {
    int i;
    for(i = 0;i < length; i++ ) {
        target[i] =  (SInt16) (source[i] >> 9);
        //target[i] *= 0.003;

    }
}

*As you can see I tried modifying the amplitude of the samples to get rid of the clipping -- clearly that didn't work.

// convert sample vector from SInt16 to fixed point 8.24 
void SInt16ToFixedPoint( SInt16 * source, SInt32 * target, int length ) {
    int i;
    for(i = 0;i < length; i++ ) {
        target[i] =  (SInt32) (source[i] << 9);
        if(source[i] < 0) { 
            target[i] |= 0xFF000000;
        }
        else {
            target[i] &= 0x00FFFFFF;
        }
    }
}

If I can determine the difference between kAudioFormatFlagIsFloat | kAudioFormatFlagsNativeEndian | kAudioFormatFlagIsPacked | kAudioFormatFlagIsNonInterleaved, then I can modify the above methods accordingly. But I'm not sure how to figure that out. Documentation in CoreAudio is enigmatic, but from what I've read there, and gleaned from the CoreAudioTypes.h file, both mFormatFlag(s) refer to the same Fixed Point 8.24 format. Clearly something is different, but I can't figure out what.

Thanks for reading through this long question, and thanks in advance for any insight you can provide.

like image 570
Joshua Weinberg Avatar asked Apr 09 '12 12:04

Joshua Weinberg


1 Answers

kAudioFormatFlagIsFloat means that the buffer contains floating point values. If mBitsPerChannel is 32 then you are dealing with float data (also called Float32), and if it is 64 you are dealing with double data.

kAudioFormatFlagsNativeEndian refers to the fact that the data in the buffer matches the endianness of the processor, so you don't have to worry about byte swapping.

kAudioFormatFlagIsPacked means that every bit in the data is significant. For example, if you store 24 bit audio data in 32 bits, this flag will not be set.

kAudioFormatFlagIsNonInterleaved means that each individual buffer consists of one channel of data. It is common for audio data to be interleaved, with the samples alternating between L and R channels: LRLRLRLR. For DSP applications it is often easier to deinterleave the data and work on one channel at a time.

I think in your case the error is that you are treating floating point data as fixed point. Float data is generally scaled to the interval [-1, +1). To convert float to SInt16 you need to multiply each sample by the maximum 16-bit value (1u << 15, 32768) and then clip to the interval [-32768, 32767].

like image 130
sbooth Avatar answered Nov 11 '22 20:11

sbooth