I'm currently attempting to play back audio which I receive in a series of UDP packets. These are decoded into PCM frames with the following properties:
Every UDP packet contains 480 frames, so the buffer's size is 480 * 2(channels) * 2(bytes per channel).
I need to set up an Audio Unit to play back these packets. So, my first question is, how should I set up the AudioStreamBasicDescription struct for the Audio Unit? Looking at the documentation I'm not even sure if interleaved PCM is an acceptable format.
This is what I've got so far:
struct AudioStreamBasicDescription {
Float64 mSampleRate; //48000
UInt32 mFormatID; //?????
UInt32 mFormatFlags; //?????
UInt32 mBytesPerPacket; //Not sure what "packet" means here
UInt32 mFramesPerPacket; //Same as above
UInt32 mBytesPerFrame; //Same
UInt32 mChannelsPerFrame; //2?
UInt32 mBitsPerChannel; //16?
UInt32 mReserved; //???
};
typedef struct AudioStreamBasicDescription AudioStreamBasicDescription;
Secondly, after setting it up, I'm not sure how to get the frames from the UDP callback to the actual Audio Unit rendering function.
I currently have a callback function from the socket listener in which I generate the int16 * buffers that contain the audio I want to play. As I understand it, I also have to implement a render callback for the audio unit of the following form:
OSStatus RenderFrames(
void *inRefCon,
AudioUnitRenderActionFlags *ioActionFlags,
const AudioTimeStamp *inTimeStamp,
UInt32 inBusNumber,
UInt32 inNumberFrames,
AudioBufferList *ioData)
{
//No idea what I should do here.
return noErr;
}
Putting it all together, I think what my socket reception callback should do is decode the frames, and put them in a buffer structure, so that the RenderFrames callback can fetch the frames from that buffer, and play them back. Is this correct? And if it is, once I fetch the next frame in the RenderFrames function, how do I actually "submit it" for playback?
Taking this a section at a time
Apple's documentation for the ASBD is here. To clarify:
2
. mBytesPerPacket = mBytesPerFrame
, mFramesPerPacket=1
but I'm not sure whether this is actually ever used.mReserved
isn't used and must be 0
mFormatID
and mFormatFlags
. There is a handy helper function CalculateLPCMFlags
in CoreAudioTypes.h for computing the latter of these in CoreAudioTypes.h
. mFormatFlags
if you really don't want it to be).FillOutASBDForLPCM()
for the common cases of linear PCM.mFormatID
and mFormatFlags
are not supported by remoteIO units - I found experimentation to be necessary on iOS. Here's some working code from one of my projects:
AudioStreamBasicDescription inputASBL = {0};
inputASBL.mSampleRate = static_cast<Float64>(sampleRate);
inputASBL.mFormatID = kAudioFormatLinearPCM;
inputASBL.mFormatFlags = kAudioFormatFlagIsPacked | kAudioFormatFlagIsSignedInteger,
inputASBL.mFramesPerPacket = 1;
inputASBL.mChannelsPerFrame = 2;
inputASBL.mBitsPerChannel = sizeof(short) * 8;
inputASBL.mBytesPerPacket = sizeof(short) * 2;
inputASBL.mBytesPerFrame = sizeof(short) * 2;
inputASBL.mReserved = 0;
CoreAudio operates what Apple describe as a pull model. That is to say, that the render call-back is called form a real-time thread when CoreAudio needs the buffer filling. From your question it appears you are expecting the opposite - pushing the data to the audio output.
There are essentially two implementation choices:
The second is probably the better choice, but you are going to need to manage buffer over- and under-runs yourself.
The ioData
argument points to a scatter-gather control structure. In the simplest case, it points to one buffer containing all of the frames, but could contain several that between them have sufficient frames to satisfy inNumberFrames
. Normally, one pre-allocates a buffer big enough for inNumberFrames
, copies samples into it and then modifies the AudioBufferList
object pointed to buy ioData
to point to it.
In your application you could potentially a scatter-gather approach on your decoded audio packets, allocating buffers as they are decoded. However, you don't always get the latency you wanted and might not be able to arrange for inNumberFrames
to be the same as your decoded UDP frames of audio.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With