Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the correct way to convert a Vec for FFI without reallocation?

Tags:

vector

rust

ffi

I need to pass a Vec of elements across the FFI. Experimenting, I came across a few interesting points. I started with giving the FFI all 3: ptr, len and capacity so that I could reconstruct the Vec to destroy it later:

let ptr = vec.as_mut_ptr();
let len = vec.len();
let cap = vec.capacity();
mem::forget(vec);
extern_fn(ptr, len, cap);

// ...

pub unsafe extern "C" fn free(ptr: *mut u8, len: usize, cap: usize) {
    let _ = Vec::from_raw_parts(ptr, len, cap);
}

I wanted to get rid of capacity as it's useless to my frontend; it's just so that I can reconstruct my vector to free the memory.

Vec::shrink_to_fit() is tempting as it seems to eliminate the need of dealing with capacity. Unfortunately, the documentation on it does not guarantee that it'll make len == capacity, hence I assume that during from_raw_parts() will likely trigger Undefined Behavior.

into_boxed_slice() seems to have a guarantee that it's going to make len == capacity from the docs, so I used that next. Please correct me if I'm wrong. The problem is that it does not seem to guarantee no-reallocation. Here is a simple program:

fn main() {
    let mut v = Vec::with_capacity(1000);
    v.push(100u8);
    v.push(110);
    let ptr_1 = v.as_mut_ptr();
    let mut boxed_slice = v.into_boxed_slice();
    let ptr_2 = boxed_slice.as_mut_ptr();
    let ptr_3 = Box::into_raw(boxed_slice);
    println!("{:?}. {:?}. {:?}", ptr_1, ptr_2, ptr_3);
}

In the playground, It prints:

rustc 1.14.0 (e8a012324 2016-12-16)
0x7fdc9841b000. 0x7fdc98414018. 0x7fdc98414018

This is not good if it has to find new memory instead of being able to shed off extra capacity without causing a copy.

Is there any other way I can pass my vector across the FFI (to C) and not pass capacity? It seems into_boxed_slice() is what I need, but why does it involve re-allocation and copying data?

like image 809
ustulation Avatar asked Jan 18 '17 15:01

ustulation


People also ask

What does VEC mean in Rust?

A contiguous growable array type, written as Vec<T> , short for 'vector'.

How do you make a VEC in Rust?

In Rust, there are several ways to initialize a vector. In order to initialize a vector via the new() method call, we use the double colon operator: let mut vec = Vec::new();

What is a VEC u8?

Vec<u8> is like Box<[u8]> , except it additionally stores a "capacity" count, making it three machine words wide. Separately stored capacity allows for efficient resizing of the underlying array.

How do you clear a vector in Rust?

To remove all elements from a vector in Rust, use . retain() method to keep all elements the do not match. let mut v = vec![ "A", "warm", "fall", "warm", "day"]; let elem = "warm"; // element to remove v.


1 Answers

The reason is relatively simple.

Modern memory allocators will segregate allocations in "sized" slabs, where each slab is responsible for dealing with a given range of sizes. For example:

  • 8 bytes slab: anything from 1 to 8 bytes
  • 16 bytes slab: anything from 9 to 16 bytes
  • 24 bytes slab: anything from 17 to 24 bytes
  • ...

When you allocate memory, you ask for a given size, the allocator finds the right slab, gets a chunk from it, and returns your pointer.

When you deallocate memory... how do you expect the allocator to find the right slab? There are 2 solutions:

  • the allocator has a way to search for the slab that contains your range of memory, somehow, which involves either a linear search through the slabs or some kind of global look-up table or ...
  • you tell the allocator what was the size of the allocated block

It's obvious here that the C interface (free, realloc) is rather sub-par, and therefore Rust wishes to use the more efficient interface instead, the one where the onus is on the caller.


So, you have two choices:

  1. Pass the capacity
  2. Ensure that the length and the capacity are equal

As you realized, (2) may require a new allocation, which is quite undesirable. (1) can be implemented either by passing the capacity the whole way, or stash it at some point then retrieve it when you need it.

That's it. You have to evaluate your trade-offs.

like image 110
Matthieu M. Avatar answered Oct 26 '22 01:10

Matthieu M.