First, some background info: I'm writing a MacOS/X application that uses CoreAudio to receive an audio signal from a CoreAudio device's input stream, do some real-time processing on the audio, and then send it back to that CoreAudio device's output stream for the user to hear.
This application uses the lower-level CoreAudio APIs (i.e. AudioDeviceAddIOProc
, AudioDeviceStart
, etc -- not AudioUnits) to grab exclusive access to a user-specified CoreAudio device, set it to the desired sample rate (96kHz), and do its thing. It works very well, and I'm quite happy with its performance.
However, my program currently has a limitation -- it can only use a single CoreAudio device at a time. What I'd like to do is extend my application so that the user can choose his "input CoreAudio device" and his "output CoreAudio device" independently of each other, rather than being restricted to using only a single CoreAudio device that supplies both the input audio source and the output audio sink.
My question is, what is the recommended technique for doing this? I can require that both CoreAudio devices be settable to the same sample-rate, but even once I do that, I think I will have to handle various issues, such as:
integrating separate AudioDeviceStart()
-initiated callbacks from the two devices, which I suspect will not be called in any well-defined order, and might even be called concurrently with respect to each other(?). I would need to pass audio from one callback to the other somehow, ideally without significantly increasing audio latency.
Handling differences in the sample-clock rates of the two devices. E.g. even if both devices are nominally set to 96kHz sample rate, I suspect it may actually be the case that e.g. the upstream device is producing samples at 95.99999kHz while the downstream device is consuming them at 96.000001kHz (or vice-versa), and that would eventually cause me to end up with either "not enough" or "too many" samples to feed the downstream device during a given rendering-callback, causing a glitch.
Any other gotchas they I haven't considered yet
How do other MacOS/X programs handle these issues?
A time ago I played with a proof of concept playground audiomixer in C. Nothing of this is finished, but things do actually work. The library uses the lowest Core Audio API available, thus indeed with things like AudioDeviceCreateIOProcID
and AudioObjectAddPropertyListener
.
In short, this playground allows me to use multiple audio devices known to MacOS and route one or more audio streams between them while passing through different kinds of "nodes" along the way (think of a matrix mixer node for example).
AudioDeviceStart()
initiated callbacks will fire each from a different (random) thread. Also, the callbacks will not be called in a deterministic order. I also found that the difference between the callbacks can vary a lot (seemingly depending on the audio device providing/asking for data).
To solve this problem I used a lock-free (that is, using atomic counters) ringbuffer.
Your concern about different clock domains is very real. Two devices running at 96KHz will run at different speeds. This can go well for a long time, but eventually one of them is going to run out of data and start to glitch. If the external devices are not externally synchronised together, using for example word or ptp, they'll run in their own time domain. To pass audio between different time domains you'll have to async-sample-rate-convert the audio data. And the SRC will need to have the possibility to convert in very small ratios and adjust along the way. One of those doing this very well is Soxr. In the world of Core Audio there is a VarispeedNode, which allows you to do basically the same thing. Big disadvantage of the async-src solution is the latency it introduces, however maybe you could specify "low-latency".
In your case the synchronisation of the different audio devices will be the biggest challenge. In my case I found the callbacks of different audio devices vary too much to select one for being "clock-master" so I ended up creating a standalone time domain by carefully timing the execution of the processing cycle. For this I used low level timing mechanisms like mach_wait_until()
and mach_absolute_time()
(there's not much documentation on that).
However, there might be another solution. Looking at the documentation in AudioHardware.h
from the CoreAudio framework, there seems to be a way to create an aggregate device programmatically using AudioHardwareCreateAggregateDevice()
. This allows you to let MacOS handle the synchronisation of different audio devices. Also note the kAudioAggregateDeviceIsPrivateKey
key which allows you to create an aggregate device without publishing it to the whole system. So, the device won't show up in Audio MIDI Setup (I think). Please also note that this key makes the aggregate disappearing when the process which created it stops running. It might or might not be what you need, but this would be a very robust way of implementing using multiple audio devices.
If I were to write the software again, I definitely would look into this way of doing the synchronisation.
In general when dealing with low-latency audio you want to achieve the most deterministic behaviour possible. But I'm sure you are aware of this.
Another gotcha is that the documentation of the Core Audio api is not available on Apple's developer website (https://developer.apple.com/documentation/coreaudio/core_audio_functions?language=objc). For that you'll have to dive into the headers of the Core Audio framework where you'll find a lot of useful documentation about using the API.
On my machine the headers are located at: /Library/Developer/CommandLineTools/SDKs/MacOSX10.15.sdk/System/Library/Frameworks/CoreAudio.framework/Versions/A/Headers
http://atastypixel.com/blog/four-common-mistakes-in-audio-development http://www.rossbencina.com/code/real-time-audio-programming-101-time-waits-for-nothing https://developer.apple.com/library/archive/qa/qa1467/_index.html
The "leaky bucket" algorithm combined with a fractional interpolating resampler can be used to dynamically adjust very slight (and non-constant!) sample rate differences. Bigger jumps or skips in rates usually require more complicated error concealment strategies. Lots of variations on lock-free circular/ring buffers using atomic primitives to pass data between async audio threads. I use mach timers or the CADisplay link timer to drive UI polling threads (for controls, displays, etc.). I usually try to start the output first, and fill it with silence until the input starts supplying samples, then cross fade it in. Then cross fade out to silence again after the input stops.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With