Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the `PhantomData` actually doing in the implementation of `Vec`? [duplicate]

Tags:

rust

How does PhantomData work in Rust? In the Nomicon it says the following:

In order to tell dropck that we do own values of type T, and therefore may drop some T's when we drop, we must add an extra PhantomData saying exactly that.

To me that seems to imply that when we add a PhantomData field to a structure, say in the case of a Vec.

pub struct Vec<T> {
    data: *mut T,
    length: usize,
    capacity: usize,
    phantom: PhantomData<T>,
}

that the drop checker should forbid the following sequence of code:

fn main() -> () {
    let mut vector = Vec::new();

    let x = Box::new(1 as i32);
    let y = Box::new(2 as i32);
    let z = Box::new(3 as i32);

    vector.push(x);
    vector.push(y);
    vector.push(z);
}

Since the freeing of x, y, and z would occur before the freeing of the Vec, I would expect some complaint from the compiler. However, if you run the code above there is no warning or error.

like image 524
Novus Avatar asked Jan 08 '17 14:01

Novus


People also ask

Why is PhantomData needed?

PhantomData consumes no space, but simulates a field of the given type for the purpose of static analysis. This was deemed to be less error-prone than explicitly telling the type-system the kind of variance that you want, while also providing other useful things such as the information needed by drop check.

What is PhantomData in Rust?

PhantomData is a way to have a field that acts like the type parameter for subtyping, but has no runtime cost. Rust told us to make our struct look like so: struct Tagged<T>(usize, PhantomData<T>);

What is phantom data?

We call hidden and risky content 'Phantom Data' to express that it is often unknown or unseen and also has the strong potential to hurt your organization's operations. Most organizations have a Phantom Data problem and very few know how to solve it.


1 Answers

The PhantomData<T> within Vec<T> (held indirectly via a Unique<T> within RawVec<T>) communicates to the compiler that the vector may own instances of T, and therefore the vector may run destructors for T when the vector is dropped.


Deep dive: We have a combination of factors here:

  • We have a Vec<T> which has an impl Drop (i.e. a destructor implementation).

  • Under the rules of RFC 1238, this would usually imply a relationship between instances of Vec<T> and any lifetimes that occur within T, by requiring that all lifetimes within T strictly outlive the vector.

  • However, the destructor for Vec<T> specifically opts out of this semantics for just that destructor (of Vec<T> itself) via the use of special unstable attributes (see RFC 1238 and RFC 1327). This allows for a vector to hold references that have the same lifetime of the vector itself. This is considered sound; after all, the vector itself will not dereference data pointed to by such references (all its doing is dropping values and deallocating the backing array), as long as an important caveat holds.

  • The important caveat: While the vector itself will not dereference pointers within its contained values while destructing itself, it will drop the values held by the vector. If those values of type T themselves have destructors, those destructors for T get run. And if those destructors access the data held within their references, then we would have a problem if we allowed dangling pointers within those references.

  • So, diving in even more deeply: the way that we confirm dropck validity for a given structure S, we first double check if S itself has an impl Drop for S (and if so, we enforce rules on S with respect to its type parameters). But even after that step, we then recursively descend into the structure of S itself, and double check for each of its fields that everything is kosher according to dropck. (Note that we do this even if a type parameter of S is tagged with #[may_dangle].)

  • In this specific case, we have a Vec<T> which (indirectly via RawVec<T>/Unique<T>) owns a collection of values of type T, represented in a raw pointer *const T. However, the compiler attaches no ownership semantics to *const T; that field alone in a structure S implies no relationship between S and T, and thus enforces no constraint in terms of the relationship of lifetimes within the types S and T (at least from the viewpoint of dropck).

  • Therefore, if the Vec<T> had solely a *const T, the recursive descent into the structure of the vector would fail to capture the ownership relation between the vector and the instances of T contained within the vector. That, combined with the #[may_dangle] attribute on T, would cause the compiler to accept unsound code (namely cases where destructors for T end up trying to access data that has already been deallocated).

  • BUT: Vec<T> does not solely contain a *const T. There is also a PhantomData<T>, and that conveys to the compiler "hey, even though you can assume (due to the #[may_dangle] T) that the destructor for Vec won't access data of T when the vector is dropped, it is still possible that some destructor of T itself will access data of T as the vector is dropped."

The end effect: Given Vec<T>, if T doesn't have a destructor, then the compiler provides you with more flexibility (namely, it allows a vector to hold data with references to data that lives for the same amount of time as the vector itself, even though such data may be torn down before the vector is). But if T does have a destructor (and that destructor is not otherwise communicating to the compiler that it won't access any referenced data), then the compiler is more strict, requiring any referenced data to strictly outlive the vector (thus ensuring that when the destructor for T runs, all the referenced data will still be valid).

like image 129
pnkfelix Avatar answered Sep 27 '22 22:09

pnkfelix