How does <code>PhantomData</code> work in Rust? In the Nomicon it says the following: <blockquote> In order to tell dropck that we do own values of type T, and therefore may drop some T's when we drop, we must add an extra PhantomData saying exactly that. </blockquote> To me that seems to imply that when we add a <code>PhantomData</code> field to a structure, say in the case of a <code>Vec</code>. <pre class="prettyprint"><code>pub struct Vec<T> { data: *mut T, length: usize, capacity: usize, phantom: PhantomData<T>, } </code></pre> that the drop checker should forbid the following sequence of code: <pre class="prettyprint"><code>fn main() -> () { let mut vector = Vec::new(); let x = Box::new(1 as i32); let y = Box::new(2 as i32); let z = Box::new(3 as i32); vector.push(x); vector.push(y); vector.push(z); } </code></pre> Since the freeing of <code>x</code>, <code>y</code>, and <code>z</code> would occur before the freeing of the <code>Vec</code>, I would expect some complaint from the compiler. However, if you run the code above there is no warning or error.

The <code>PhantomData<T></code> within <code>Vec<T></code> (held indirectly via a <code>Unique<T></code> within <code>RawVec<T></code>) communicates to the compiler that the vector may own instances of <code>T</code>, and therefore the vector may run destructors for <code>T</code> when the vector is dropped. <hr> Deep dive: We have a combination of factors here: <ul> <li>We have a <code>Vec<T></code> which has an <code>impl Drop</code> (i.e. a destructor implementation).</li> <li>Under the rules of RFC 1238, this would usually imply a relationship between instances of <code>Vec<T></code> and any lifetimes that occur within <code>T</code>, by requiring that all lifetimes within <code>T</code> strictly outlive the vector.</li> <li>However, the destructor for <code>Vec<T></code> specifically opts out of this semantics for just that destructor (of <code>Vec<T></code> itself) via the use of special unstable attributes (see RFC 1238 and RFC 1327). This allows for a vector to hold references that have the same lifetime of the vector itself. This is considered sound; after all, the vector itself will not dereference data pointed to by such references (all its doing is dropping values and deallocating the backing array), as long as an important caveat holds.</li> <li>The important caveat: While the vector itself will not dereference pointers within its contained values while destructing itself, it will drop the values held by the vector. If those values of type <code>T</code> themselves have destructors, those destructors for <code>T</code> get run. And if those destructors access the data held within their references, then we would have a problem if we allowed dangling pointers within those references.</li> <li>So, diving in even more deeply: the way that we confirm dropck validity for a given structure <code>S</code>, we first double check if <code>S</code> itself has an <code>impl Drop for S</code> (and if so, we enforce rules on <code>S</code> with respect to its type parameters). But even after that step, we then recursively descend into the structure of <code>S</code> itself, and double check for each of its fields that everything is kosher according to dropck. (Note that we do this even if a type parameter of <code>S</code> is tagged with <code>#[may_dangle]</code>.)</li> <li>In this specific case, we have a <code>Vec<T></code> which (indirectly via <code>RawVec<T></code>/<code>Unique<T></code>) owns a collection of values of type <code>T</code>, represented in a raw pointer <code>*const T</code>. However, the compiler attaches no ownership semantics to <code>*const T</code>; that field alone in a structure <code>S</code> implies no relationship between <code>S</code> and <code>T</code>, and thus enforces no constraint in terms of the relationship of lifetimes within the types <code>S</code> and <code>T</code> (at least from the viewpoint of dropck).</li> <li>Therefore, if the <code>Vec<T></code> had solely a <code>*const T</code>, the recursive descent into the structure of the vector would fail to capture the ownership relation between the vector and the instances of <code>T</code> contained within the vector. That, combined with the <code>#[may_dangle]</code> attribute on <code>T</code>, would cause the compiler to accept unsound code (namely cases where destructors for <code>T</code> end up trying to access data that has already been deallocated).</li> <li>BUT: <code>Vec<T></code> does not solely contain a <code>*const T</code>. There is also a <code>PhantomData<T></code>, and that conveys to the compiler "hey, even though you can assume (due to the <code>#[may_dangle] T</code>) that the destructor for <code>Vec</code> won't access data of <code>T</code> when the vector is dropped, it is still possible that some destructor of <code>T</code> itself will access data of <code>T</code> as the vector is dropped."</li> </ul> The end effect: Given <code>Vec<T></code>, if <code>T</code> doesn't have a destructor, then the compiler provides you with more flexibility (namely, it allows a vector to hold data with references to data that lives for the same amount of time as the vector itself, even though such data may be torn down before the vector is). But if <code>T</code> does have a destructor (and that destructor is not otherwise communicating to the compiler that it won't access any referenced data), then the compiler is more strict, requiring any referenced data to strictly outlive the vector (thus ensuring that when the destructor for <code>T</code> runs, all the referenced data will still be valid).

What is the `PhantomData` actually doing in the implementation of `Vec`? [duplicate]

Tags:

rust

How does PhantomData work in Rust? In the Nomicon it says the following:

In order to tell dropck that we do own values of type T, and therefore may drop some T's when we drop, we must add an extra PhantomData saying exactly that.

To me that seems to imply that when we add a PhantomData field to a structure, say in the case of a Vec.

pub struct Vec<T> {
    data: *mut T,
    length: usize,
    capacity: usize,
    phantom: PhantomData<T>,
}

that the drop checker should forbid the following sequence of code:

fn main() -> () {
    let mut vector = Vec::new();

    let x = Box::new(1 as i32);
    let y = Box::new(2 as i32);
    let z = Box::new(3 as i32);

    vector.push(x);
    vector.push(y);
    vector.push(z);
}

Since the freeing of x, y, and z would occur before the freeing of the Vec, I would expect some complaint from the compiler. However, if you run the code above there is no warning or error.

524

asked Jan 08 '17 14:01

Novus

1 Answers

The PhantomData<T> within Vec<T> (held indirectly via a Unique<T> within RawVec<T>) communicates to the compiler that the vector may own instances of T, and therefore the vector may run destructors for T when the vector is dropped.

Deep dive: We have a combination of factors here:

We have a Vec<T> which has an impl Drop (i.e. a destructor implementation).
Under the rules of RFC 1238, this would usually imply a relationship between instances of Vec<T> and any lifetimes that occur within T, by requiring that all lifetimes within T strictly outlive the vector.
However, the destructor for Vec<T> specifically opts out of this semantics for just that destructor (of Vec<T> itself) via the use of special unstable attributes (see RFC 1238 and RFC 1327). This allows for a vector to hold references that have the same lifetime of the vector itself. This is considered sound; after all, the vector itself will not dereference data pointed to by such references (all its doing is dropping values and deallocating the backing array), as long as an important caveat holds.
The important caveat: While the vector itself will not dereference pointers within its contained values while destructing itself, it will drop the values held by the vector. If those values of type T themselves have destructors, those destructors for T get run. And if those destructors access the data held within their references, then we would have a problem if we allowed dangling pointers within those references.
So, diving in even more deeply: the way that we confirm dropck validity for a given structure S, we first double check if S itself has an impl Drop for S (and if so, we enforce rules on S with respect to its type parameters). But even after that step, we then recursively descend into the structure of S itself, and double check for each of its fields that everything is kosher according to dropck. (Note that we do this even if a type parameter of S is tagged with #[may_dangle].)
In this specific case, we have a Vec<T> which (indirectly via RawVec<T>/Unique<T>) owns a collection of values of type T, represented in a raw pointer *const T. However, the compiler attaches no ownership semantics to *const T; that field alone in a structure S implies no relationship between S and T, and thus enforces no constraint in terms of the relationship of lifetimes within the types S and T (at least from the viewpoint of dropck).
Therefore, if the Vec<T> had solely a *const T, the recursive descent into the structure of the vector would fail to capture the ownership relation between the vector and the instances of T contained within the vector. That, combined with the #[may_dangle] attribute on T, would cause the compiler to accept unsound code (namely cases where destructors for T end up trying to access data that has already been deallocated).
BUT: Vec<T> does not solely contain a *const T. There is also a PhantomData<T>, and that conveys to the compiler "hey, even though you can assume (due to the #[may_dangle] T) that the destructor for Vec won't access data of T when the vector is dropped, it is still possible that some destructor of T itself will access data of T as the vector is dropped."

The end effect: Given Vec<T>, if T doesn't have a destructor, then the compiler provides you with more flexibility (namely, it allows a vector to hold data with references to data that lives for the same amount of time as the vector itself, even though such data may be torn down before the vector is). But if T does have a destructor (and that destructor is not otherwise communicating to the compiler that it won't access any referenced data), then the compiler is more strict, requiring any referenced data to strictly outlive the vector (thus ensuring that when the destructor for T runs, all the referenced data will still be valid).

129

answered Sep 27 '22 22:09

pnkfelix

Related questions
                            
                                Updating public fields of Rust structs which have private fields
                            
                                How to get subslices with neither panicking nor unsafe?
                            
                                Conflicting trait implementations even though associated types differ
                            
                                How difficult is it to allow parallel compilation of code with the Rust stable and nightly channels?
                            
                                Why do `assert_eq` and `assert_ne` exist when a simple `assert` will suffice?
                            
                                How does the VecDeque ring buffer work internally?
                            
                                What is the stabilization process?
                            
                                What is the difference between `&str` and `&'static str` in a static or const?
                            
                                How can I add the to_string() functionality to an enum?
                            
                                Retrieving backtrace from a panic in hook in Rust?
                            
                                What is the difference between futures::select! and tokio::select?
                            
                                Mixing anyhow::Result with std::io::Result
                            
                                What is the implicit lifetime for the 1st argument when the 2nd argument is annotated with 'a?
                            
                                What benefits are there with making println a macro?
                            
                                How to deny/ban the use of certain external functions
                            
                                Getting the error "the trait Sized is not implemented" when trying to return a value from a vector
                            
                                Mutating an item inside of nested loops
                            
                                How to Iterator::chain a vector of iterators?
                            
                                Why do Arc and Mutex allow me to change the value of an immutable variable?
                            
                                Why do I get an error about non-exhaustive patterns?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With