Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is it safe to clone a type-erased Arc via raw pointer?

Tags:

I'm in a situation where I'm working with data wrapped in an Arc, and I sometimes end up using into_raw to get the raw pointer to the underlying data. My use case also calls for type-erasure, so the raw pointer often gets cast to a *const c_void, then cast back to the appropriate concrete type when re-constructing the Arc.

I've run into a situation where it would be useful to be able to clone the Arc without needing to know the concrete type of the underlying data. As I understand it, it should be safe to reconstruct the Arc with a dummy type solely for the purpose of calling clone, so long as I never actually dereference the data. So, for example, this should be safe:

pub unsafe fn clone_raw(handle: *const c_void) -> *const c_void {
    let original = Arc::from_raw(handle);
    let copy = original.clone();
    mem::forget(original);
    Arc::into_raw(copy)
}

Is there anything that I'm missing that would make this actually unsafe? Also, I assume the answer would apply to Rc as well, but if there are any differences please let me know!

like image 340
randomPoison Avatar asked Jan 09 '20 20:01

randomPoison


1 Answers

This is almost always unsafe.

An Arc<T> is just a pointer to a heap-allocated struct which roughly looks like

struct ArcInner<T: ?Sized> {
    strong: atomic::AtomicUsize,
    weak: atomic::AtomicUsize,
    data: T,  // You get a raw pointer to this element
}

into_raw() gives you a pointer to the data element. The implementation of Arc::from_raw() takes such a pointer, assumes that it's a pointer to the data-element in an ArcInner<T>, walks back in memory and assumes to find an ArcInner<T> there. This assumption depends on the memory-layout of T, specifically it's alignment and therefore it's exact placement in ArcInner.

If you call into_raw() on an Arc<U> and then call from_raw() as if it was an Arc<V> where U and V differ in alignment, the offset-calculation of where U/V is in ArcInner will be wrong and the call to .clone() will corrupt the data structure. Dereferencing T is therefore not required to trigger memory unsafety.

In practice, this might not be a problem: Since data is the third element after two usize-elements, most T will probably be aligned the same way. However, if the stdlib-implementation changes or you end up compiling for a platform where this assumption is wrong, reconstructing an Arc<V>::from_raw that was created by an Arc<U> where the memory layout of V and U is different will be unsafe and crash.


Update:

Having thought about it some more I downgrade my vote from "might be safe, but cringy" to "most likely unsafe" because I can always do

#[repr(align(32))]
struct Foo;

let foo = Arc::new(Foo);

In this example Foo will be aligned to 32 bytes, making ArcInner<Foo> 32 bytes in size (8+8+16+0) while a ArcInner<()> is just 16 bytes (8+8+0+0). Since there is no way to tell what the alignment of T is after the type has been erased, there is no way to reconstruct a valid Arc.

There is an escape hatch that might be safe in practice: By wrapping T into another Box, the layout of ArcInner<T> is always the same. In order to force this upon any user, you can do something like

struct ArcBox<T>(Arc<Box<T>>)

and implement Deref on that. Using ArcBox instead of Arc forces the memory layout of ArcInner to always be the same, because T is behind another pointer. This, however, means that all access to T requires a double dereference, which might badly affect performance.

like image 56
user2722968 Avatar answered Sep 30 '22 19:09

user2722968