Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multithreaded Realtime audio programming - To block or Not to block

When writing audio software many people on the internet say it is paramount not to use either memory allocation or blocking code, i.e no locks. Due to the fact these are non deterministic so could cause the output buffer to underflow and the audio will glitch.

Real Time Audio Progrmaming

When I write video software, I generally use both, i.e. allocating video frames on the heap and passing between threads using locks and conditional variables (bounded buffers). I love the power this provides as a separate thread can be used for each operation, allowing the software to max out each of the cores, giving the best performance.

With audio I'd like to do something similar, passing frames of maybe 100 samples between threads, however, there are two issues.

  1. How do I generate the frames without using memory allocation? I suppose I could use a pool of frames that have been pre-allocated but this seems messy.

  2. I'm aware you can use lock free queue and that boost has a nice library to do this. This would be a great way to share between threads, but constantly polling the queue to see if data is available seems like a massive waist of CPU time.

In my experience using mutexes doesn't actually take much time at all, provided that the section where the mutex is locked is short.

What is the best way to achieve passing audio frames between threads, whilst keeping latency to a minimum, not wasting resources and using relatively little non-deterministic behaviour?

like image 318
SvaLopLop Avatar asked Jan 02 '15 07:01

SvaLopLop


1 Answers

Seems like you did your research! You've already identified the two main problems that could be the root-cause of audio glitches. The question is: How much of this was important 10 years ago and is only folklore and cargo-cult programming these days.

My two cents:

1. Heap allocations in the rendering loop:

These can have quite a lot overhead depending on how small your processing chunks are. The main culprit is, that very few run-times have a per-thread heap, so each time you mess with the heap your performance depends on what other threads in your process do. If for example a GUI thread is currently deleting thousands of objects, and you - at the same time - access the heap from the audio rendering thread you may experience a significant delay.

Writing your own memory management with pre-allocated buffers may sound messy, but in the end it's just two functions that you can hide somewhere in a utility source. Since you usually know your allocation sizes in advance there is a lot of opportunity to fine-tune and optimize your memory management. You can store your segments as a simple linked list for example. If done right this has the benefit that you allocate the last used buffer again. This buffer has a very high probability of beeing in the cache.

If fixed size allocators don't work for you have a look at ring-buffers. They fit the use-cases of streaming audio very well.

2. To lock, or not to lock:

I'd say, these days using mutex and semaphore locks are fine if you can estimate that you do less than 1000 to 5000 of them per second (on a PC, things are different on something like a Raspberry Pi etc.). If you stay below that range it is unlikely that the overhead shows up in a performance profile.

Translated to your use-case: If you for example work with 48kHz audio and 100 sample chunks you generate roughly 960 lock/unlock operation in a simple two thread consumer/producer pattern. that is well within the range. In case you completely max out the rendering thread the locking will not show up in a profiling. If you on the other hand only use like 5% of the available processing power the locks may show up, but you will not have a performance problem either :-)

Going lock-less is also an option, but so are hybrid solutions that first do some lock-less tries and then fall back to hard locking. You'll get the best of both worlds that way. There is a lot of good stuff to read about this topic on the net.

In any case:

You should raise the thread priority of your non GUI threads gently to make sure that if they run into a lock, they get out of it quickly. It is also a good idea to read what Priority Inversion is, and what you can do to avoid it:

https://en.wikipedia.org/wiki/Priority_inversion

like image 161
Nils Pipenbrinck Avatar answered Oct 08 '22 00:10

Nils Pipenbrinck