Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Does reading or writing a whole 32-bit word, even though we only have a reference to a part of it, result in undefined behaviour?

I'm trying to understand what exactly the Rust aliasing/memory model allows. In particular I'm interested in when accessing memory outside the range you have a reference to (which might be aliased by other code on the same or different threads) becomes undefined behaviour.

The following examples all access memory outside what is ordinarily allowed, but in ways that would be safe if the compiler produced the obvious assembly code. In addition, I see little conflict potential with compiler optimization, but they might still violate strict aliasing rules of Rust or LLVM thus constituting undefined behavior.

The operations are all properly aligned and thus cannot cross a cache-line or page boundary.

  1. Read the aligned 32-bit word surrounding the data we want to access and discard the parts outside of what we're allowed to read.

    Variants of this could be useful in SIMD code.

    pub fn read(x: &u8) -> u8 {
        let pb = x as *const u8;
        let pw = ((pb as usize) & !3) as *const u32;
        let w = unsafe { *pw }.to_le();
        (w >> ((pb as usize) & 3) * 8) as u8
    }
    
  2. Same as 1, but reads the 32-bit word using an atomic_load intrinsic.

    pub fn read_vol(x: &u8) -> u8 {
        let pb = x as *const u8;
        let pw = ((pb as usize) & !3) as *const AtomicU32;
        let w = unsafe { (&*pw).load(Ordering::Relaxed) }.to_le();
        (w >> ((pb as usize) & 3) * 8) as u8
    }
    
  3. Replace the aligned 32-bit word containing the value we care about using CAS. It overwrites the parts outside what we're allowed to access with what's already in there, so it only affects the parts we're allowed to access.

    This could be useful to emulate small atomic types using bigger ones. I used AtomicU32 for simplicity, in practice AtomicUsize is the interesting one.

    pub fn write(x: &mut u8, value:u8) {
        let pb = x as *const u8;
        let atom_w = unsafe { &*(((pb as usize) & !3) as *const AtomicU32) };
        let mut old = atom_w.load(Ordering::Relaxed);
        loop {
            let shift = ((pb as usize) & 3) * 8;
            let new = u32::from_le((old.to_le() & 0xFF_u32 <<shift)|((value as u32) << shift));
            match atom_w.compare_exchange_weak(old, new, Ordering::SeqCst, Ordering::Relaxed) {
                Ok(_) => break,
                Err(x) => old = x,
            }
        }
    }
    
like image 684
CodesInChaos Avatar asked May 04 '18 18:05

CodesInChaos


1 Answers

This is a very interesting question. There are actually several issues with these functions, making them unsound (i.e., not safe to expose) for various formal reasons. At the same time, I am unable to actually construct a problematic interaction between these functions and compiler optimizations.

Out-of-bounds accesses

I'd say all of these functions are unsound because they can access unallocated memory. Each of them I can call with a &*Box::new(0u8) or &mut *Box::new(0u8), resulting in out-of-bounds accesses, i.e. accesses beyond what was allocated using malloc (or whatever allocator). Neither C nor LLVM permit such accesses. (I'm using the heap because I find it easier to think about allocations there, but the same applies to the stack where every stack variable is really its own independent allocation.)

Granted, the LLVM language reference doesn't actually define when a load has undefined behavior due to the access not being inside the object. However, we can get a hint in the documentation of getlementptr inbounds, which says

The in bounds addresses for an allocated object are all the addresses that point into the object, plus the address one byte past the end.

I am fairly certain that being in bounds is a necessary but not sufficient requirement for actually using an address with load/store.

Note that this is independent of what happens on the assembly level; LLVM will do optimizations based on a much higher-level memory model that argues in terms of allocated blocks (or "objects" as C calls them) and staying within the bounds of these blocks. C (and Rust) are not assembly, and it is not possible to use assembly-based reasoning on them. Most of the time it is possible to derive contradictions from assembly-based reasoning (see e.g. this bug in LLVM for a very subtle example: casting a pointer to an integer and back is not a NOP). This time, however, the only examples I can come up with are fairly far-fetched: For example, with memory-mapped IO, even reads from a location could "mean" something to the underlying hardware, and there could be such a read-sensitive location sitting right next to the one that's passed into read. But really I don't know much about this kind of embedded/driver development, so this may be entirely unrealistic.

(EDIT: I should add that I am not an LLVM expert. Probably the llvm-dev mailing list is a better place to determine if they are willing to commit to permitting such out-of-bounds accesses.)

Data races

There is another reason at least some of these functions are not sound: Concurrency. You clearly already saw this coming, judging from the use of concurrent accesses.

Both read and read_vol are definitely unsound under the concurrency semantics of C11. Imagine x is the first element of a [u8], and another thread is writing to the second element at the same time as we execute read/read_vol. Our read of the whole 32bit word overlaps with the other thread's write. This is a classical "data race": Two threads accessing the same location at the same time, one access being a write, and one access not being atomic. Under C11, any data race is UB so we are out. LLVM is slightly more permissive so both read and read_val are probably allowed, but right now Rust declares that it uses the C11 model.

Also note that "vol" is a bad name (assuming you meant this as short-hand for "volatile") -- in C, atomicity has nothing to do with volatile! It is literally impossible to write correct concurrent code when using volatile and not atomics. Unfortunately, Java's volatile is about atomicity, but that's a very different volatile than the one in C.

And finally, write also introduces a data race between an atomic read-modify-update and a non-atomic write in the other thread, so it is UB in C11 as well. And this time it is also UB in LLVM: Another thread could be reading from one of the extra locations that write affects, so calling write would introduce a data race between our writing and the other thread's reading. LLVM specifies that in this case, the read returns undef. So, calling write can make safe accesses to the same location in other threads return undef, and subsequently trigger UB.

Do we have any examples of issues caused by these functions?

The frustrating part is, while I found multiple reasons to rule out your functions following the spec(s), there seems to be no good reason that these functions are ruled out! The read and read_vol concurrency issues are fixed by LLVM's model (which however has other problems, compared to C11), but write is illegal in LLVM just because read-write data races make the read return undef -- and in this case we know we are writing the same value that was already stored in these other bytes! Couldn't LLVM just say that in this special case (writing the value that's already there), the read must return that value? Probably yes, but this stuff is subtle enough that I would also not be surprised if that invalidates some obscure optimization.

Moreover, at least on non-embedded platforms the out-of-bounds accesses done by read are unlikely to cause actual trouble. I guess one could imagine a semantics which returns undef when reading an out-of-bounds byte that is guaranteed to sit on the same page as an in-bounds byte. But that would still leave write illegal, and that is a really tough one: write can only be allowed if the memory on these other locations is left absolutely unchanged. There could be arbitrary data sitting there from other allocations, parts of the stack frame, whatever. So somehow the formal model would have to let you read those other bytes, not allow you to gain anything by inspecting them, but also verify that you are not changing the bytes before writing them back with a CAS. I'm not aware of any model that would let you do that. But I thank you for bringing these nasty cases to my attention, it's always good to know that there is still plenty of stuff left to research in terms of memory models :)

Rust's aliasing rules

Finally, what you were probably wondering about is whether these functions violate any of the additional aliasing rules that Rust adds. The trouble is, we don't know -- these rules are still under development. However, all the proposals I have seen so far would indeed rule out your functions: When you hold an &mut u8 (say, one that points right next to the one that's passed to read/read_vol/write), the aliasing rules provide a guarantee that no access whatsoever will happen to that byte by anyone but you. So, your functions reading from memory that others could hold a &mut u8 to already makes them violate the aliasing rules.

However, the motivation for these rules is to conform with the C11 concurrency model and LLVM's rules for memory access. If LLVM declares something UB, we have to make it UB in Rust as well unless we are willing to change our codegen in a way that avoids the UB (and typically sacrifices performance). Moreover, given that Rust adopted the C11 concurrency model, the same holds true for that. So for these cases, the aliasing rules really don't have any choice but make these accesses illegal. We could revisit this once we have a more permissive memory model, but right now our hands are bound.

like image 85
Ralf Jung Avatar answered Nov 02 '22 13:11

Ralf Jung