In trying to write an optimized DSP algorithm, I was wondering about relative speed between stack allocation and heap allocation, and size limits of stack-allocated arrays. I realize there is a stack frame size limit, but I don't understand why the following runs, generating seemingly realistic benchmark results with cargo bench
, but fails with a stack overflow when run with cargo test --release
.
#![feature(test)]
extern crate test;
#[cfg(test)]
mod tests {
use test::Bencher;
#[bench]
fn it_works(b: &mut Bencher) {
b.iter(|| { let stack = [[[0.0; 2]; 512]; 512]; });
}
}
To get things into perspective, note that the size of your array is 8 × 2 × 512 × 512 = 4 MiB in size.
cargo test
crashes but cargo bench
doesn't because a "test" calls the function it_works()
in a new thread, while "bench" calls it in the main thread.
The default stack size of the main thread is typically 8 MiB, so that array is going to occupy half of the available stack. That's a lot, but there's still room available, so the benchmark runs normally.
The stack size of a new thread, however, is typically much smaller. On Linux it is 2 MiB, and other platforms could be even smaller. So, your 4 MiB array easily overflows the thread's stack and causes a stack overflow / segfault.
You can increase the default stack size of new threads by setting the RUST_MIN_STACK
environment variable.
$ RUST_MIN_STACK=8388608 cargo test
cargo test
runs the tests in parallel threads to improve total test time while benchmarks are run sequentially in the same thread to reduce noise.
Due to the limited stack size, it is a bad idea to allocate this array on stack. You have to either store it on the heap (box
it) or as a global static mut
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With