I'm interested in implementing an audio editor in C or C++ on Windows and Linux. I can't figure out how to display the waveform quickly enough in its fully zoomed out view. I'm not looking for information about fast frame buffer techniques. This is a question about algorithms and data-structures to efficiently determine what to display.
Say I want to be able to edit a 5 channel, 48 KHz, 24-bit sound that is 2 hours long. That's 5 gigabytes of sample data. I want to be able to zoom out from one-pixel-per-sample all the way until all the sample data is visible at once. I want the application to feel responsive, even on a slow machine, like, for arguments sake, a 1 GHz Atom. When I say responsive, I'd like the GUI updates to generally occur within 1/30th of a second of the user input.
A naive implementation would scan every sample in the whole waveform when deciding what to render for the fully zoomed-out view - it needs to find the max and min sample values for all samples "covered" by each pixel width of the display. I wrote a simple app to test the speed of this approach. I tested with a 1 hour long, mono, 16-bit, 44.1 KHz sample on my 2015 3.5 GHz Xeon. It takes 0.12 seconds. This is hundreds of times too slow.
You can imagine maintaining a cache of zoomed out data, but I can't see how to avoid having to recalculate the entire cache after most inserts or deletes. It feels like there must be a better way.
Here's a diagram showing what I want to achieve:
This is how the display in most currently available audio editors works. Users are likely to expect this behaviour. I tested with Audacity, and it works this way (although it also shows something like the mean of the samples in a lighter colour too). It can handle arbitrary inserts into large sounds, seemingly instantly. I'm not going to read the 75 megabytes of source code to find out how it does it.
EDIT:
Various people have suggested schemes that involve only considering a subset of the samples when showing the zoomed out view. I've come to the conclusion that I don't want to do that because it loses too much useful information. For example, including all the samples is important if you are looking for a glitch in the sound, like a click in a vinyl conversion. In the worst case, if the glitch is only one sample long, I still want a guarantee that it is shown in the fully zoomed out view.
One cycle of the waveform is four division in length. Therefore, the period (T) of the waveform is found as: The period of the AC waveform can be measured from any given point on the waveform to the identical point in the next cycle, as illustrated in Figure 4.
Waveform Data Display Scale (003A,0230) specifies the recommended display scale in the time dimension for the waveform data in units of mm/s. The display application needs to know the horizontal pixel scaling of the display device to effectively apply this attribute.
Waveform period is measured by multiplying the time base by the number of horizontal divisions that a complete cycle of a waveform occupies on the oscilloscope display. The frequency can now be determined by calculating the reciprocal of the period. Did you find apk for android?
A waveform that repeats itself continuously (like the one shown in Figure 3b) is called Periodic Waveform. The time required to complete one cycle of an AC waveform is referred to as its Period. Example 1 demonstrates the concept of waveform period.
After reading Peter Stock's answer, I've come up with the following scheme. I think it will allow display calculation about 500 times faster than the naive scheme and shouldn't add any noticeable cost to inserts or deletes. The memory overhead is less than 1%.
The sound data will be allocated in blocks of 131072 samples, so that inserts and deletes don't require the entire sound to be reallocated and copied. When the sound is first loaded, each block will be completely filled (except probably the last one). Inserts and deletes will lead to a kind of fragmentation. For simplicity, I will arrange for the start of each block to always contain valid sample data, and any gaps will be at the end of the block.
Each block has two look-up tables associated with it, one for max values and one for min. Each item in the look-up tables corresponds to 1024 samples.
The diagram below shows how to calculate the max value for one pixel width of the display. It shows a few blocks relevant to the calculation. It assumes there is no "fragmentation".
After an insert, the situation is slightly more complicated. Two blocks now have invalid regions at their ends. There are entries in the max look-up table that now corresponds to a part-empty region of samples. The value for these entries are found by just taking the max of the samples that are present.
When the zoom is at the point where you have multiple samples per pixel it is not worth calculating accurately the mean sample value for each pixel. The user can't align the GUI tooling accurately at that level of zoom so it's no benefit. The user just needs a qualitative view.
I would just select one sample per screen pixel for the window area, skipping over the unnecessary samples.
Something like this completely untested code:
std::vector<double> samples(1024*1024); // [-1.0 < s < 1.0]
int window_x = 1024; // window size in pixels
int window_y = 768; // window size in pixels
// visit every window pixel
for(int x = 0; x < window_x; ++x)
{
// select relevant sample for the current screen pixel x
double s = samples[(x * samples.size()) / window_x];
int y = (window_y / 2) * s; // get y size for sample value
// draw sample point/line at coordinate (x, f(y))
gd.draw_line(x, (window_y / 2) - y, x, (window_y / 2) + y);
}
Obviously you also need to account for window scrolling etc...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With