Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Setting up an Audio Unit format and render callback for interleaved PCM audio

I'm currently attempting to play back audio which I receive in a series of UDP packets. These are decoded into PCM frames with the following properties:

  • 2 channels
  • interleaved
  • 2 bytes per sample in a single channel (so 4 bytes per frame)
  • with a sample rate of 48000.

Every UDP packet contains 480 frames, so the buffer's size is 480 * 2(channels) * 2(bytes per channel).

I need to set up an Audio Unit to play back these packets. So, my first question is, how should I set up the AudioStreamBasicDescription struct for the Audio Unit? Looking at the documentation I'm not even sure if interleaved PCM is an acceptable format.

This is what I've got so far:

struct AudioStreamBasicDescription {
   Float64 mSampleRate;                 //48000
   UInt32  mFormatID;                   //?????
   UInt32  mFormatFlags;                //?????
   UInt32  mBytesPerPacket;             //Not sure what "packet" means here
   UInt32  mFramesPerPacket;            //Same as above
   UInt32  mBytesPerFrame;              //Same
   UInt32  mChannelsPerFrame;           //2?
   UInt32  mBitsPerChannel;             //16?
   UInt32  mReserved;                   //???
};
typedef struct AudioStreamBasicDescription  AudioStreamBasicDescription;

Secondly, after setting it up, I'm not sure how to get the frames from the UDP callback to the actual Audio Unit rendering function.

I currently have a callback function from the socket listener in which I generate the int16 * buffers that contain the audio I want to play. As I understand it, I also have to implement a render callback for the audio unit of the following form:

OSStatus RenderFrames(
    void                        *inRefCon,
    AudioUnitRenderActionFlags  *ioActionFlags,
    const AudioTimeStamp        *inTimeStamp,
    UInt32                      inBusNumber,
    UInt32                      inNumberFrames,
    AudioBufferList             *ioData)
{
    //No idea what I should do here.
    return noErr;
}

Putting it all together, I think what my socket reception callback should do is decode the frames, and put them in a buffer structure, so that the RenderFrames callback can fetch the frames from that buffer, and play them back. Is this correct? And if it is, once I fetch the next frame in the RenderFrames function, how do I actually "submit it" for playback?

like image 654
Sergio Morales Avatar asked Jan 22 '13 17:01

Sergio Morales


1 Answers

Taking this a section at a time

AudioStreamBasicDescriptor

Apple's documentation for the ASBD is here. To clarify:

  • A frame of audio is a time-coincident set of audio samples. In other words, one sample per channel. For Stereo this is therefore 2.
  • For PCM formats, there is no packetisation. Supposedly, mBytesPerPacket = mBytesPerFrame, mFramesPerPacket=1 but I'm not sure whether this is actually ever used.
  • mReserved isn't used and must be 0
  • Refer to The documentation for mFormatID and mFormatFlags. There is a handy helper function CalculateLPCMFlags in CoreAudioTypes.h for computing the latter of these in CoreAudioTypes.h.
  • Multi-channel audio is generally interleaved (you can set a bit in mFormatFlags if you really don't want it to be).
  • There's another helper function that can fill out the entire ASBD - FillOutASBDForLPCM() for the common cases of linear PCM.
  • Lots of combinations of mFormatID and mFormatFlags are not supported by remoteIO units - I found experimentation to be necessary on iOS.

Here's some working code from one of my projects:

AudioStreamBasicDescription inputASBL = {0}; 

inputASBL.mSampleRate =          static_cast<Float64>(sampleRate);
inputASBL.mFormatID =            kAudioFormatLinearPCM;
inputASBL.mFormatFlags =         kAudioFormatFlagIsPacked | kAudioFormatFlagIsSignedInteger,
inputASBL.mFramesPerPacket =     1;
inputASBL.mChannelsPerFrame =    2;
inputASBL.mBitsPerChannel =      sizeof(short) * 8;
inputASBL.mBytesPerPacket =      sizeof(short) * 2;
inputASBL.mBytesPerFrame =       sizeof(short) * 2;
inputASBL.mReserved =            0;

Render Callbacks

CoreAudio operates what Apple describe as a pull model. That is to say, that the render call-back is called form a real-time thread when CoreAudio needs the buffer filling. From your question it appears you are expecting the opposite - pushing the data to the audio output.

There are essentially two implementation choices:

  1. Perform non-blocking reads from the UDP socket in the render callback (as a general rule, anything you do in here should be fast and non-blocking).
  2. Maintain an audio FIFO into which samples are inserted when receive and consumed by the render callback.

The second is probably the better choice, but you are going to need to manage buffer over- and under-runs yourself.

The ioData argument points to a scatter-gather control structure. In the simplest case, it points to one buffer containing all of the frames, but could contain several that between them have sufficient frames to satisfy inNumberFrames. Normally, one pre-allocates a buffer big enough for inNumberFrames, copies samples into it and then modifies the AudioBufferList object pointed to buy ioData to point to it.

In your application you could potentially a scatter-gather approach on your decoded audio packets, allocating buffers as they are decoded. However, you don't always get the latency you wanted and might not be able to arrange for inNumberFrames to be the same as your decoded UDP frames of audio.

like image 90
marko Avatar answered Oct 12 '22 03:10

marko