I'm new to the iOS and its C underpinnings, but not to programming in general. My dilemma is this. I'm implementing an echo effect in a complex AudioUnits based application. The application needs reverb, echo, and compression, among other things. However, the echo only works right when I use a particular AudioStreamBasicDescription format for the audio samples generated in my app. This format however doesn't work with the other AudioUnits. While there are other ways to solve this problem fixing the bit-twiddling in the echo algorithm might be the most straight forward approach.
The*AudioStreamBasicDescription* that works with echo has a mFormatFlag of: kAudioFormatFlagsAudioUnitCanonical; It's specifics are:
AudioUnit Stream Format (ECHO works, NO AUDIO UNITS)
Sample Rate: 44100
Format ID: lpcm
Format Flags: 3116 = kAudioFormatFlagsAudioUnitCanonical
Bytes per Packet: 4
Frames per Packet: 1
Bytes per Frame: 4
Channels per Frame: 2
Bits per Channel: 32
Set ASBD on input
Set ASBD on output
au SampleRate rate: 0.000000, 2 channels, 12 formatflags, 1819304813 mFormatID, 16 bits per channel
The stream format that works with AudioUnits is the same except for the mFormatFlag: kAudioFormatFlagIsFloat | kAudioFormatFlagsNativeEndian | kAudioFormatFlagIsPacked | kAudioFormatFlagIsNonInterleaved -- Its specifics are:
AudioUnit Stream Format (NO ECHO, AUDIO UNITS WORK)
Sample Rate: 44100
Format ID: lpcm
Format Flags: 41
Bytes per Packet: 4
Frames per Packet: 1
Bytes per Frame: 4
Channels per Frame: 2
Bits per Channel: 32
Set ASBD on input
Set ASBD on output
au SampleRate rate: 44100.000000, 2 channels, 41 formatflags, 1819304813 mFormatID, 32 bits per channel
In order to create the echo effect I use two functions that bit-shift sample data into SInt16 space, and back. As I said, this works for the kAudioFormatFlagsAudioUnitCanonical, format but not the other. When it fails, the sounds are clipped and distorted, but they are there. I think this indicates that the difference between these two formats is how the data is arranged in the Float32.
// convert sample vector from fixed point 8.24 to SInt16
void fixedPointToSInt16( SInt32 * source, SInt16 * target, int length ) {
int i;
for(i = 0;i < length; i++ ) {
target[i] = (SInt16) (source[i] >> 9);
//target[i] *= 0.003;
}
}
*As you can see I tried modifying the amplitude of the samples to get rid of the clipping -- clearly that didn't work.
// convert sample vector from SInt16 to fixed point 8.24
void SInt16ToFixedPoint( SInt16 * source, SInt32 * target, int length ) {
int i;
for(i = 0;i < length; i++ ) {
target[i] = (SInt32) (source[i] << 9);
if(source[i] < 0) {
target[i] |= 0xFF000000;
}
else {
target[i] &= 0x00FFFFFF;
}
}
}
If I can determine the difference between kAudioFormatFlagIsFloat | kAudioFormatFlagsNativeEndian | kAudioFormatFlagIsPacked | kAudioFormatFlagIsNonInterleaved, then I can modify the above methods accordingly. But I'm not sure how to figure that out. Documentation in CoreAudio is enigmatic, but from what I've read there, and gleaned from the CoreAudioTypes.h file, both mFormatFlag(s) refer to the same Fixed Point 8.24 format. Clearly something is different, but I can't figure out what.
Thanks for reading through this long question, and thanks in advance for any insight you can provide.
kAudioFormatFlagIsFloat
means that the buffer contains floating point values. If mBitsPerChannel
is 32 then you are dealing with float
data (also called Float32), and if it is 64 you are dealing with double
data.
kAudioFormatFlagsNativeEndian
refers to the fact that the data in the buffer matches the endianness of the processor, so you don't have to worry about byte swapping.
kAudioFormatFlagIsPacked
means that every bit in the data is significant. For example, if you store 24 bit audio data in 32 bits, this flag will not be set.
kAudioFormatFlagIsNonInterleaved
means that each individual buffer consists of one channel of data. It is common for audio data to be interleaved, with the samples alternating between L and R channels: LRLRLRLR
. For DSP applications it is often easier to deinterleave the data and work on one channel at a time.
I think in your case the error is that you are treating floating point data as fixed point. Float data is generally scaled to the interval [-1, +1). To convert float
to SInt16
you need to multiply each sample by the maximum 16-bit value (1u << 15
, 32768) and then clip to the interval [-32768, 32767].
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With