Is there any way to allocate a standard Rust array directly on the heap, skipping the stack entirely?

Tags:

There are several questions already on Stack Overflow about allocating an array (say [i32]) on the heap. The general recommendation is boxing, e.g. Box<[i32]>. But while boxing works fine enough for smaller arrays, the problem is that the array being boxed has to first be allocated on the stack.

So if the array is too large (say 10 million elements), you will - even with boxing - get a stack overflow (one is unlikely to have a stack that large).

The suggestion then is using Vec<T> instead, that is Vec<i32> in our example. And while that does do the job, it does have a performance impact.

Consider the following program:

fn main() {
    const LENGTH: usize = 10_000;
    let mut a: [i32; LENGTH] = [0; LENGTH];
    for j in 0..LENGTH {
        for i in 0..LENGTH {
            a[i] = j as i32;
        }
    }
}

time tells me that this program takes about 2.9 seconds to run. I use 10'000 in this example, so I can allocate that on the stack, but I really want one with 10 million.

Now consider the same program but with Vec<T> instead:

fn main() {
    const LENGTH: usize = 10_000;
    let mut a: Vec<i32> = vec![0; LENGTH];
    for j in 0..LENGTH {
        for i in 0..LENGTH {
            a[i] = j as i32;
        }
    }
}

time tells me that this program takes about 5 seconds to run. Now time isn't super exact, but the difference of about 2 seconds for such a simple program is not an insignificant impact.

Storage is storage, the program with array is just as fast when the array is boxed. So it's not the heap slowing the Vec<T> version down, but the Vec<T> structure itself.

I also tried with a HashMap (specifically HashMap<usize, i32> to mimic an array structure), but that's far slower than the Vec<T> solution.

If my LENGTH had been 10 million, the first version wouldn't even have run.

If that's not possible, is there a structure that behaves like an array (and Vec<T>) on the heap, but can match the speed and performance of an array?

850

asked Dec 09 '18 09:12

Svip

2 Answers

Summary: your benchmark is flawed; just use a Vec (as described here), possibly with into_boxed_slice, as it is incredibly unlikely to be slower than a heap allocated array.

Unfortunately, your benchmarks are flawed. First of all, you probably didn't compile with optimizations (--release for cargo, -O for rustc). Because if you would have, the Rust compiler would have removed all of your code. See the assembly here. Why? Because you never observe the vector/array, so there is no need to do all that work in the first place.

Also, your benchmark is not testing what you actually want to test. You are comparing an stack-allocated array with a heap-allocated vector. You should compare the Vec to a heap allocated array.

Don't feel bad though: writing benchmarks is incredible hard for many reasons. Just remember: if you don't know a lot about writing benchmarks, better don't trust your own benchmarks without asking others first.

I fixed your benchmark and included all three possibilities: Vec, array on stack and array on heap. You can find the full code here. The results are:

running 3 tests
test array_heap  ... bench:   9,600,979 ns/iter (+/- 1,438,433)
test array_stack ... bench:   9,232,623 ns/iter (+/- 720,699)
test vec_heap    ... bench:   9,259,912 ns/iter (+/- 691,095)

Surprise: the difference between the versions are way less than the variance of the measurement. So we can assume they all are pretty equally fast.

Note that this benchmark is still pretty bad. The two loops can just be replaced by one loop setting all array elements to LENGTH - 1. From taking a quick look at the assembly (and from the rather long time of 9ms), I think that LLVM is not smart enough to actually perform this optimization. But things like this are important and one should be aware of that.

Finally, let's discuss why both solutions should be equally fast and whether there are actually differences in speed.

The data section of a Vec<T> has exactly the same memory layout as a [T]: just many Ts contiguously in memory. Super simple. This also means both exhibit the same caching-behavior (specifically, being very cache-friendly).

The only difference is that a Vec might have more capacity than elements. So Vec itself stores (pointer, length, capacity). That is one word more than a simple (boxed) slice (which stores (pointer, length)). A boxed array doesn't need to store the length, as it's already in the type, so it is just a simple pointer. Whether or not we store one, two or three words is not really important when you will have millions of elements anyway.

Accessing one element is the same for all three: we do a bounds check first and then calculate the target pointer via base_pointer + size_of::<T>() * index. But it's important to note that the array storing its length in the type means that the bounds check can be removed more easily by the optimizer! This can be a real advantage.

However, bounds checks are already usually removed by the smart optimizer. In the benchmark code I posted above, there are no bounds checks in the assembly. So while a boxed array could be a bit faster by removed bounds checks, (a) this will be a minor performance difference and (b) it's very unlikely that you will have a lot of situations where the bound check is removed for the array but not for the Vec/slice.

194

answered Sep 16 '22 14:09

Lukas Kalbertodt

If you really want a heap-allocated array, i.e. Box<[i32; LENGTH]> then you can use:

fn main() {
    const LENGTH: usize = 10_000_000;

    let mut a = {
        let mut v: Vec<i32> = Vec::with_capacity(LENGTH);

        // Explicitly set length which is safe since the allocation is
        // sized correctly.
        unsafe { v.set_len(LENGTH); };

        // While not required for this particular example, in general
        // we want to initialize elements to a known value.
        let mut slice = v.into_boxed_slice();
        for i in &mut slice[..] {
            *i = 0;
        }

        let raw_slice = Box::into_raw(slice);

        // Using `from_raw` is safe as long as the pointer is
        // retrieved using `into_raw`.
        unsafe {
            Box::from_raw(raw_slice as *mut [i32; LENGTH])
        }
    };

    // This is the micro benchmark from the question.
    for j in 0..LENGTH {
        for i in 0..LENGTH {
            a[i] = j as i32;
        }
    }
}

It's not going to be faster than using a vector since Rust does bounds-checking even on arrays, but it has a smaller interface which might make sense in terms of software design.

answered Sep 16 '22 14:09

malthe

Related questions
                            
                                Pull-to-refresh modifies array value?
                            
                                Sort an array so the difference of elements a[i]-a[i+1]<=a[i+1]-a[i+2]
                            
                                Why Swift doesn't type inference to Any when put multiple type item in Array
                            
                                C++ Pointer to const array of const pointers
                            
                                Possible to create statically allocated array in swift?
                            
                                How to resolve this " Uncaught TypeError: Cannot convert undefined or null to object "
                            
                                Find the dimensionality of a javascript array
                            
                                Filter and sort a JavaScript array
                            
                                Is there a way to assign values to an Option Base 1 array in VBA using a single line?
                            
                                time complexity for iterating through an array list
                            
                                How do I remove opposite values from an array?
                            
                                Worst case of the Quicksort algorithm
                            
                                Python: Trimming underscores from end of String
                            
                                How to avoid a retain cycle when using an array of delegates in Swift [duplicate]
                            
                                react count object properties in an array
                            
                                Why does first() rxjs operator keep returning array
                            
                                Function which accepts array of arbitrary size as argument (is it possible in Golang?)
                            
                                Combine elements from two arrays by pairs
                            
                                Why do we use `const` for strings when defining with pointers in C?
                            
                                Good way to filter list distincted by property and ordered by date

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is there any way to allocate a standard Rust array directly on the heap, skipping the stack entirely?

Tags:

arrays

heap-memory

rust

Svip

People also ask

2 Answers

Lukas Kalbertodt

malthe

Recent Activity

Donate For Us