I'm interested in implementing an audio editor in C or C++ on Windows and Linux. I can't figure out how to display the waveform quickly enough in its fully zoomed out view. I'm not looking for information about fast frame buffer techniques. This is a question about algorithms and data-structures to efficiently determine what to display. Say I want to be able to edit a 5 channel, 48 KHz, 24-bit sound that is 2 hours long. That's 5 gigabytes of sample data. I want to be able to zoom out from one-pixel-per-sample all the way until all the sample data is visible at once. I want the application to feel responsive, even on a slow machine, like, for arguments sake, a 1 GHz Atom. When I say responsive, I'd like the GUI updates to generally occur within 1/30th of a second of the user input. A naive implementation would scan every sample in the whole waveform when deciding what to render for the fully zoomed-out view - it needs to find the max and min sample values for all samples "covered" by each pixel width of the display. I wrote a simple app to test the speed of this approach. I tested with a 1 hour long, mono, 16-bit, 44.1 KHz sample on my 2015 3.5 GHz Xeon. It takes 0.12 seconds. This is hundreds of times too slow. You can imagine maintaining a cache of zoomed out data, but I can't see how to avoid having to recalculate the entire cache after most inserts or deletes. It feels like there must be a better way. Here's a diagram showing what I want to achieve: <img src="https://i.stack.imgur.com/twWla.png" alt="enter image description here"> This is how the display in most currently available audio editors works. Users are likely to expect this behaviour. I tested with Audacity, and it works this way (although it also shows something like the mean of the samples in a lighter colour too). It can handle arbitrary inserts into large sounds, seemingly instantly. I'm not going to read the 75 megabytes of source code to find out how it does it. EDIT: Various people have suggested schemes that involve only considering a subset of the samples when showing the zoomed out view. I've come to the conclusion that I don't want to do that because it loses too much useful information. For example, including all the samples is important if you are looking for a glitch in the sound, like a click in a vinyl conversion. In the worst case, if the glitch is only one sample long, I still want a guarantee that it is shown in the fully zoomed out view.

When the zoom is at the point where you have multiple samples per pixel it is not worth calculating accurately the mean sample value for each pixel. The user can't align the GUI tooling accurately at that level of zoom so it's no benefit. The user just needs a qualitative view. I would just select one sample per screen pixel for the window area, skipping over the unnecessary samples. Something like this completely untested code: <pre class="prettyprint"><code>std::vector<double> samples(1024*1024); // [-1.0 < s < 1.0] int window_x = 1024; // window size in pixels int window_y = 768; // window size in pixels // visit every window pixel for(int x = 0; x < window_x; ++x) { // select relevant sample for the current screen pixel x double s = samples[(x * samples.size()) / window_x]; int y = (window_y / 2) * s; // get y size for sample value // draw sample point/line at coordinate (x, f(y)) gd.draw_line(x, (window_y / 2) - y, x, (window_y / 2) + y); } </code></pre> Obviously you also need to account for window scrolling etc...

Fast display of waveform in C/C++

Tags:

c++

performance

algorithm

data-structures

I'm interested in implementing an audio editor in C or C++ on Windows and Linux. I can't figure out how to display the waveform quickly enough in its fully zoomed out view. I'm not looking for information about fast frame buffer techniques. This is a question about algorithms and data-structures to efficiently determine what to display.

Say I want to be able to edit a 5 channel, 48 KHz, 24-bit sound that is 2 hours long. That's 5 gigabytes of sample data. I want to be able to zoom out from one-pixel-per-sample all the way until all the sample data is visible at once. I want the application to feel responsive, even on a slow machine, like, for arguments sake, a 1 GHz Atom. When I say responsive, I'd like the GUI updates to generally occur within 1/30th of a second of the user input.

A naive implementation would scan every sample in the whole waveform when deciding what to render for the fully zoomed-out view - it needs to find the max and min sample values for all samples "covered" by each pixel width of the display. I wrote a simple app to test the speed of this approach. I tested with a 1 hour long, mono, 16-bit, 44.1 KHz sample on my 2015 3.5 GHz Xeon. It takes 0.12 seconds. This is hundreds of times too slow.

You can imagine maintaining a cache of zoomed out data, but I can't see how to avoid having to recalculate the entire cache after most inserts or deletes. It feels like there must be a better way.

Here's a diagram showing what I want to achieve:

enter image description here

This is how the display in most currently available audio editors works. Users are likely to expect this behaviour. I tested with Audacity, and it works this way (although it also shows something like the mean of the samples in a lighter colour too). It can handle arbitrary inserts into large sounds, seemingly instantly. I'm not going to read the 75 megabytes of source code to find out how it does it.

EDIT:

Various people have suggested schemes that involve only considering a subset of the samples when showing the zoomed out view. I've come to the conclusion that I don't want to do that because it loses too much useful information. For example, including all the samples is important if you are looking for a glitch in the sound, like a click in a vinyl conversion. In the worst case, if the glitch is only one sample long, I still want a guarantee that it is shown in the fully zoomed out view.

940

asked May 31 '16 19:05

Andrew Bainbridge

Video Answer

2 Answers

After reading Peter Stock's answer, I've come up with the following scheme. I think it will allow display calculation about 500 times faster than the naive scheme and shouldn't add any noticeable cost to inserts or deletes. The memory overhead is less than 1%.

The sound data will be allocated in blocks of 131072 samples, so that inserts and deletes don't require the entire sound to be reallocated and copied. When the sound is first loaded, each block will be completely filled (except probably the last one). Inserts and deletes will lead to a kind of fragmentation. For simplicity, I will arrange for the start of each block to always contain valid sample data, and any gaps will be at the end of the block.

Each block has two look-up tables associated with it, one for max values and one for min. Each item in the look-up tables corresponds to 1024 samples.

The diagram below shows how to calculate the max value for one pixel width of the display. It shows a few blocks relevant to the calculation. It assumes there is no "fragmentation".

Display calculation for one pixel width (no fragmentation)

After an insert, the situation is slightly more complicated. Two blocks now have invalid regions at their ends. There are entries in the max look-up table that now corresponds to a part-empty region of samples. The value for these entries are found by just taking the max of the samples that are present.

Display calculation for one pixel width (with fragmentation)

answered Nov 10 '22 16:11

Andrew Bainbridge

When the zoom is at the point where you have multiple samples per pixel it is not worth calculating accurately the mean sample value for each pixel. The user can't align the GUI tooling accurately at that level of zoom so it's no benefit. The user just needs a qualitative view.

I would just select one sample per screen pixel for the window area, skipping over the unnecessary samples.

Something like this completely untested code:

std::vector<double> samples(1024*1024); // [-1.0 < s < 1.0]

int window_x = 1024; // window size in pixels
int window_y = 768; // window size in pixels

// visit every window pixel
for(int x = 0; x < window_x; ++x)
{
    // select relevant sample for the current screen pixel x
    double s = samples[(x * samples.size()) / window_x];

    int y = (window_y / 2) * s; // get y size for sample value

    // draw sample point/line at coordinate (x, f(y))
    gd.draw_line(x, (window_y / 2) - y, x, (window_y / 2) + y);
}

Obviously you also need to account for window scrolling etc...

answered Nov 10 '22 15:11

Galik

Related questions
                            
                                How to triangulate polygons in Boost?
                            
                                Why is malloc in global namespace?
                            
                                experimental::filesystem linker error while using GCC6 after using -lstdc++fs option
                            
                                How to get notified when a process ends under linux?
                            
                                C++: Public member of a private nested class type
                            
                                OpenGL ES 2 How To Render to Texture and Extract Data for GPGPU Test
                            
                                Setting pointer to NULL before delete
                            
                                Vulkan samples: vkQueueSubmit always followed by vkWaitForFences?
                            
                                What does `Fatal Python error: PyThreadState_Get: no current thread` mean?
                            
                                Should `constexpr` functions also be `noexcept`?
                            
                                Read and remove first (or last) line from txt file without copying
                            
                                Variadic template function with more than two parameters
                            
                                gcc vs clang behavior on partial specialization with variadic arguments plus extra argument of same type
                            
                                Should I use a bit mask when truncating uint64_t to uint8_t[i]?
                            
                                class object as vector element , destructor getting called too many times
                            
                                Constructor reference parameter results in seg fault
                            
                                Calling a shared library from c++
                            
                                Templated Class Friend Operator Member Function
                            
                                static_cast Conversion constructor vs Conversion operator [duplicate]
                            
                                Compile times with boost spirit x3

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Fast display of waveform in C/C++

Tags:

c++

performance

algorithm

data-structures

Andrew Bainbridge

People also ask

Video Answer

2 Answers

Andrew Bainbridge

Galik

Recent Activity

Donate For Us