I'm working on a library that help transact types that fit in a pointer-size int over FFI boundaries. Suppose I have a struct like this:
use std::mem::{size_of, align_of};
struct PaddingDemo {
data: u8,
force_pad: [usize; 0]
}
assert_eq!(size_of::<PaddingDemo>(), size_of::<usize>());
assert_eq!(align_of::<PaddingDemo>(), align_of::<usize>());
This struct has 1 data byte and 7 padding bytes. I want to pack an instance of this struct into a usize
and then unpack it on the other side of an FFI boundary. Because this library is generic, I'm using MaybeUninit
and ptr::write
:
use std::ptr;
use std::mem::MaybeUninit;
let data = PaddingDemo { data: 12, force_pad: [] };
// In order to ensure all the bytes are initialized,
// zero-initialize the buffer
let mut packed: MaybeUninit<usize> = MaybeUninit::zeroed();
let ptr = packed.as_mut_ptr() as *mut PaddingDemo;
let packed_int = unsafe {
std::ptr::write(ptr, data);
packed.assume_init()
};
// Attempt to trigger UB in Miri by reading the
// possibly uninitialized bytes
let copied = unsafe { ptr::read(&packed_int) };
Does that assume_init
call triggered undefined behavior? In other words, when the ptr::write
copies the struct into the buffer, does it copy the uninitialized-ness of the padding bytes, overwriting the initialized state as zero bytes?
Currently, when this or similar code is run in Miri, it doesn't detect any Undefined Behavior. However, per the discussion about this issue on github, ptr::write
is supposedly allowed to copy those padding bytes, and furthermore to copy their uninitialized-ness. Is that true? The docs for ptr::write
don't talk about this at all, nor does the nomicon section on uninitialized memory.
Does that assume_init call triggered undefined behavior?
Yes. "Uninitialized" is just another value that a byte in the Rust Abstract Machine can have, next to the usual 0x00 - 0xFF. Let us write this special byte as 0xUU. (See this blog post for a bit more background on this subject.) 0xUU is preserved by copies just like any other possible value a byte can have is preserved by copies.
But the details are a bit more complicated. There are two ways to copy data around in memory in Rust. Unfortunately, the details for this are also not explicitly specified by the Rust language team, so what follows is my personal interpretation. I think what I am saying is uncontroversial unless marked otherwise, but of course that could be a wrong impression.
In general, when a range of bytes is being copied, the source range just overwrites the target range -- so if the source range was "0x00 0xUU 0xUU 0xUU", then after the copy the target range will have that exact list of bytes.
This is what memcpy
/memmove
in C behave like (in my interpretation of the standard, which is not very clear here unfortunately). In Rust, ptr::copy{,_nonoverlapping}
probably performs a byte-wise copy, but it's not actually precisely specified right now and some people might want to say it is typed as well. This was discussed a bit in this issue.
The alternative is a "typed copy", which is what happens on every normal assignment (=
) and when passing values to/from a function. A typed copy interprets the source memory at some type T
, and then "re-serializes" that value of type T
into the target memory.
The key difference to a byte-wise copy is that information which is not relevant at the type T
is lost. This is basically a complicated way of saying that a typed copy "forgets" padding, and effectively resets it to uninitialized. Compared to an untyped copy, a typed copy loses more information. Untyped copies preserve the underlying representation, typed copies just preserve the represented value.
So even when you transmute 0usize
to PaddingDemo
, a typed copy of that value can reset this to "0x00 0xUU 0xUU 0xUU" (or any other possible bytes for the padding) -- assuming data
sits at offset 0, which is not guaranteed (add #[repr(C)]
if you want that guarantee).
In your case, ptr::write
takes an argument of type PaddingDemo
, and the argument is passed via a typed copy. So already at that point, the padding bytes may change arbitrarily, in particular they may become 0xUU.
usize
Whether your code has UB then depends on yet another factor, namely whether having an uninitialized byte in a usize
is UB. The question is, does a (partially) uninitialized range of memory represent some integer? Currently, it does not and thus there is UB. However, whether that should be the case is heavily debated and it seems likely that we will eventually permit it.
Many other details are still unclear, though -- for example, transmuting "0x00 0xUU 0xUU 0xUU" to an integer may well result in a fully uninitialized integer, i.e., integers may not be able to preserve "partial initialization". To preserve partially initialized bytes in integers we would have to basically say that an integer has no abstract "value", it is just a sequence of (possibly uninitialized) bytes. This does not reflect how integers get used in operations like /
. (Some of this also depends on LLVM decisions around poison
and freeze
; LLVM might decide that when doing a load at integer type, the result is fully poison
if any input byte is poison
.) So even if the code is not UB because we permit uninitialized integers, it may not behave as expected because the data you want to transfer is being lost.
If you want to transfer raw bytes around, I suggest to use a type suited for that, such as MaybeUninit
. If you use an integer type, the goal should be to transfer integer values -- i.e., numbers.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With