For shared references and mutable references the semantics are clear: as long as you have a shared reference to a value, nothing else must have mutable access, and a mutable reference can't be shared. So this code: <pre class="prettyprint lang-rust prettyprint-override"><code>#[no_mangle] pub extern fn run_ref(a: &i32, b: &mut i32) -> (i32, i32) { let x = *a; *b = 1; let y = *a; (x, y) } </code></pre> compiles (on x86_64) to: <pre class="prettyprint"><code>run_ref: movl (%rdi), %ecx movl $1, (%rsi) movq %rcx, %rax shlq $32, %rax orq %rcx, %rax retq </code></pre> Note that the memory <code>a</code> points to is only read once, because the compiler knows the write to <code>b</code> must not have modified the memory at <code>a</code>. Raw pointer are more complicated. Raw pointer arithmetic and casts are "safe", but dereferencing them is not. We can convert raw pointers back to shared and mutable references, and then use them; this will certainly imply the usual reference semantics, and the compiler can optimize accordingly. But what are the semantics if we use raw pointers directly? <pre class="prettyprint lang-rust prettyprint-override"><code>#[no_mangle] pub unsafe extern fn run_ptr_direct(a: *const i32, b: *mut f32) -> (i32, i32) { let x = *a; *b = 1.0; let y = *a; (x, y) } </code></pre> compiles to: <pre class="prettyprint"><code>run_ptr_direct: movl (%rdi), %ecx movl $1065353216, (%rsi) movl (%rdi), %eax shlq $32, %rax orq %rcx, %rax retq </code></pre> Although we write a value of different type, the second read still goes to memory - it seems to be allowed to call this function with the same (or overlapping) memory location for both arguments. In other words, a <code>const</code> raw pointer does not forbid a coexisting <code>mut</code> raw pointer; and its probably fine to have two <code>mut</code> raw pointers (of possibly different types) to the same (or overlapping) memory location too. Note that a normal optimizing C/C++-compiler would eliminate the second read (due to the "strict aliasing" rule: modfying/reading the same memory location through pointers of different ("incompatible") types is UB in most cases): <pre class="prettyprint lang-c prettyprint-override"><code>struct tuple { int x; int y; }; extern "C" tuple run_ptr(int const* a, float* b) { int const x = *a; *b = 1.0; int const y = *a; return tuple{x, y}; } </code></pre> compiles to: <pre class="prettyprint"><code>run_ptr: movl (%rdi), %eax movl $0x3f800000, (%rsi) movq %rax, %rdx salq $32, %rdx orq %rdx, %rax ret </code></pre> Playground with Rust code examples godbolt Compiler Explorer with C example So: What are the semantics if we use raw pointers directly: is it ok for referenced data to overlap? This should have direct implications on whether the compiler is allowed to reorder memory access through raw pointers.

No awkward strict-aliasing here C++ strict-aliasing is a patch on a wooden leg. C++ does not have any aliasing information, and the absence of aliasing information prevents a number of optimizations (as you noted here), therefore to regain some performance strict-aliasing was patched on... Unfortunately, strict-aliasing is awkward in a systems language, because reinterpreting raw-memory is the essence of what systems language are designed to do. And doubly unfortunately it does not enable that many optimizations. For example, copying from one array to another must assume that the arrays may overlap. <code>restrict</code> (from C) is a bit more helpful, although it only applies to one level at a time. <hr> Instead, we have scope-based aliasing analysis The essence of the aliasing analysis in Rust is based on lexical scopes (barring threads). The beginner level explanation that you probably know is: <ul> <li>if you have a <code>&T</code>, then there is no <code>&mut T</code> to the same instance,</li> <li>if you have a <code>&mut T</code>, then there is no <code>&T</code> or <code>&mut T</code> to the same instance.</li> </ul> As suited to a beginner, it is a slightly abbreviated version. For example: <pre class="prettyprint"><code>fn main() { let mut i = 32; let mut_ref = &mut i; let x: &i32 = mut_ref; println!("{}", x); } </code></pre> is perfectly fine, even though both a <code>&mut i32</code> (<code>mut_ref</code>) and a <code>&i32</code> (<code>x</code>) point to the same instance! If you try to access <code>mut_ref</code> after forming <code>x</code>, however, the truth is unveiled: <pre class="prettyprint"><code>fn main() { let mut i = 32; let mut_ref = &mut i; let x: &i32 = mut_ref; *mut_ref = 2; println!("{}", x); } </code></pre> <blockquote> <pre class="prettyprint"><code>error[E0506]: cannot assign to `*mut_ref` because it is borrowed | 4 | let x: &i32 = mut_ref; | ------- borrow of `*mut_ref` occurs here 5 | *mut_ref = 2; | ^^^^^^^^^^^^ assignment to borrowed `*mut_ref` occurs here </code></pre> </blockquote> So, it is fine to have both <code>&mut T</code> and <code>&T</code> pointing to the same memory location at the same time; however mutating through the <code>&mut T</code> will be disabled for as long as the <code>&T</code> exists. In a sense, the <code>&mut T</code> is temporarily downgraded to a <code>&T</code>. <hr> So, what of pointers? First of all, let's review the reference: <blockquote> <ul> <li>are not guaranteed to point to valid memory and are not even guaranteed to be non-NULL (unlike both <code>Box</code> and <code>&</code>);</li> <li>do not have any automatic clean-up, unlike <code>Box</code>, and so require manual resource management;</li> <li>are plain-old-data, that is, they don't move ownership, again unlike <code>Box</code>, hence the Rust compiler cannot protect against bugs like use-after-free;</li> <li>lack any form of lifetimes, unlike <code>&</code>, and so the compiler cannot reason about dangling pointers; and</li> <li>have no guarantees about aliasing or mutability other than mutation not being allowed directly through a <code>*const T</code>.</li> </ul> </blockquote> Conspicuously absent is any rule forbidding from casting a <code>*const T</code> to a <code>*mut T</code>. That's normal, it's allowed, and therefore the last point is really more of a lint, since it can be so easily worked around. Nomicon A discussion of unsafe Rust would not be complete without pointing to the Nomicon. Essentially, the rules of unsafe Rust are rather simple: uphold whatever guarantee the compiler would have if it was safe Rust. This is not as helpful as it could be, since those rules are not set in stone yet; sorry. Then, what are the semantics for dereferencing raw pointers? As far as I know1: <ul> <li>if you form a reference from the raw pointer (<code>&T</code> or <code>&mut T</code>) then you must ensure that the aliasing rules these references obey are upheld,</li> <li>if you immediately read/write, this temporarily forms a reference.</li> </ul> That is, providing that the caller had mutable access to the location: <pre class="prettyprint"><code>pub unsafe fn run_ptr_direct(a: *const i32, b: *mut f32) -> (i32, i32) { let x = *a; *b = 1.0; let y = *a; (x, y) } </code></pre> should be valid, because <code>*a</code> has type <code>i32</code>, so there is no overlap of lifetime in references. However, I would expect: <pre class="prettyprint"><code>pub unsafe fn run_ptr_modified(a: *const i32, b: *mut f32) -> (i32, i32) { let x = &*a; *b = 1.0; let y = *a; (*x, y) } </code></pre> To be undefined behavior, because <code>x</code> would be live while <code>*b</code> is used to modify its memory. Note how subtle the change is. It's easy to break invariants in <code>unsafe</code> code. 1And I might be wrong right now, or I may become wrong in the future

What are the semantics for dereferencing raw pointers?

Tags:

rust

For shared references and mutable references the semantics are clear: as long as you have a shared reference to a value, nothing else must have mutable access, and a mutable reference can't be shared.

So this code:

#[no_mangle]
pub extern fn run_ref(a: &i32, b: &mut i32) -> (i32, i32) {
    let x = *a;
    *b = 1;
    let y = *a;
    (x, y)
}

compiles (on x86_64) to:

run_ref:
    movl    (%rdi), %ecx
    movl    $1, (%rsi)
    movq    %rcx, %rax
    shlq    $32, %rax
    orq     %rcx, %rax
    retq

Note that the memory a points to is only read once, because the compiler knows the write to b must not have modified the memory at a.

Raw pointer are more complicated. Raw pointer arithmetic and casts are "safe", but dereferencing them is not.

We can convert raw pointers back to shared and mutable references, and then use them; this will certainly imply the usual reference semantics, and the compiler can optimize accordingly.

But what are the semantics if we use raw pointers directly?

#[no_mangle]
pub unsafe extern fn run_ptr_direct(a: *const i32, b: *mut f32) -> (i32, i32) {
    let x = *a;
    *b = 1.0;
    let y = *a;
    (x, y)
}

compiles to:

run_ptr_direct:
    movl    (%rdi), %ecx
    movl    $1065353216, (%rsi)
    movl    (%rdi), %eax
    shlq    $32, %rax
    orq     %rcx, %rax
    retq

Although we write a value of different type, the second read still goes to memory - it seems to be allowed to call this function with the same (or overlapping) memory location for both arguments. In other words, a const raw pointer does not forbid a coexisting mut raw pointer; and its probably fine to have two mut raw pointers (of possibly different types) to the same (or overlapping) memory location too.

Note that a normal optimizing C/C++-compiler would eliminate the second read (due to the "strict aliasing" rule: modfying/reading the same memory location through pointers of different ("incompatible") types is UB in most cases):

struct tuple { int x; int y; };

extern "C" tuple run_ptr(int const* a, float* b) {
    int const x = *a;
    *b = 1.0;
    int const y = *a;
    return tuple{x, y};
}

compiles to:

run_ptr:
    movl    (%rdi), %eax
    movl    $0x3f800000, (%rsi)
    movq    %rax, %rdx
    salq    $32, %rdx
    orq     %rdx, %rax
    ret

Playground with Rust code examples

godbolt Compiler Explorer with C example

So: What are the semantics if we use raw pointers directly: is it ok for referenced data to overlap?

This should have direct implications on whether the compiler is allowed to reorder memory access through raw pointers.

562

asked Feb 20 '18 09:02

Stefan

1 Answers

No awkward strict-aliasing here

C++ strict-aliasing is a patch on a wooden leg. C++ does not have any aliasing information, and the absence of aliasing information prevents a number of optimizations (as you noted here), therefore to regain some performance strict-aliasing was patched on...

Unfortunately, strict-aliasing is awkward in a systems language, because reinterpreting raw-memory is the essence of what systems language are designed to do.

And doubly unfortunately it does not enable that many optimizations. For example, copying from one array to another must assume that the arrays may overlap.

restrict (from C) is a bit more helpful, although it only applies to one level at a time.

Instead, we have scope-based aliasing analysis

The essence of the aliasing analysis in Rust is based on lexical scopes (barring threads).

The beginner level explanation that you probably know is:

if you have a &T, then there is no &mut T to the same instance,
if you have a &mut T, then there is no &T or &mut T to the same instance.

As suited to a beginner, it is a slightly abbreviated version. For example:

fn main() {
    let mut i = 32;
    let mut_ref = &mut i;
    let x: &i32 = mut_ref;

    println!("{}", x);
}

is perfectly fine, even though both a &mut i32 (mut_ref) and a &i32 (x) point to the same instance!

If you try to access mut_ref after forming x, however, the truth is unveiled:

fn main() {
    let mut i = 32;
    let mut_ref = &mut i;
    let x: &i32 = mut_ref;
    *mut_ref = 2;
    println!("{}", x);
}

error[E0506]: cannot assign to `*mut_ref` because it is borrowed
  |
4 |         let x: &i32 = mut_ref;
  |                       ------- borrow of `*mut_ref` occurs here
5 |         *mut_ref = 2;
  |         ^^^^^^^^^^^^ assignment to borrowed `*mut_ref` occurs here

So, it is fine to have both &mut T and &T pointing to the same memory location at the same time; however mutating through the &mut T will be disabled for as long as the &T exists.

In a sense, the &mut T is temporarily downgraded to a &T.

So, what of pointers?

First of all, let's review the reference:

are not guaranteed to point to valid memory and are not even guaranteed to be non-NULL (unlike both Box and &);

do not have any automatic clean-up, unlike Box, and so require manual resource management;

are plain-old-data, that is, they don't move ownership, again unlike Box, hence the Rust compiler cannot protect against bugs like use-after-free;

lack any form of lifetimes, unlike &, and so the compiler cannot reason about dangling pointers; and

have no guarantees about aliasing or mutability other than mutation not being allowed directly through a *const T.

Conspicuously absent is any rule forbidding from casting a *const T to a *mut T. That's normal, it's allowed, and therefore the last point is really more of a lint, since it can be so easily worked around.

Nomicon

A discussion of unsafe Rust would not be complete without pointing to the Nomicon.

Essentially, the rules of unsafe Rust are rather simple: uphold whatever guarantee the compiler would have if it was safe Rust.

This is not as helpful as it could be, since those rules are not set in stone yet; sorry.

Then, what are the semantics for dereferencing raw pointers?

As far as I know¹:

if you form a reference from the raw pointer (&T or &mut T) then you must ensure that the aliasing rules these references obey are upheld,
if you immediately read/write, this temporarily forms a reference.

That is, providing that the caller had mutable access to the location:

pub unsafe fn run_ptr_direct(a: *const i32, b: *mut f32) -> (i32, i32) {
    let x = *a;
    *b = 1.0;
    let y = *a;
    (x, y)
}

should be valid, because *a has type i32, so there is no overlap of lifetime in references.

However, I would expect:

pub unsafe fn run_ptr_modified(a: *const i32, b: *mut f32) -> (i32, i32) {
    let x = &*a;
    *b = 1.0;
    let y = *a;
    (*x, y)
}

To be undefined behavior, because x would be live while *b is used to modify its memory.

Note how subtle the change is. It's easy to break invariants in unsafe code.

¹And I might be wrong right now, or I may become wrong in the future

123

answered Sep 28 '22 07:09

Matthieu M.

Related questions
                            
                                What are the differences between specifying lifetime parameters on an impl or on a method?
                            
                                Fixing "no rules expected the token" macro error
                            
                                Writing a generic function that takes an iterable container as parameter in Rust
                            
                                What exactly is considered a breaking change to a library crate?
                            
                                How to match over self in an enum?
                            
                                How do I share common code between Rust projects without publishing to crates.io?
                            
                                How to limit the number of test threads in Cargo.toml?
                            
                                mem::replace in Rust
                            
                                How can I store function pointers in an array? [duplicate]
                            
                                How can I specify a custom Cargo output directory?
                            
                                Join iterator of &str [duplicate]
                            
                                Creating a simple Rust daemon that listens to a port
                            
                                What are the main differences between a Rust Iterator and C++ Iterator? [closed]
                            
                                How do I provide an implementation of a generic struct in Rust?
                            
                                How to add trait bound to a non-generic type?
                            
                                "cannot find macro" error in the macro's own doc test
                            
                                How can I write crate-wide documentation?
                            
                                Is there any way to tell Cargo to run its tests on the main thread?
                            
                                Is there a way to trim a String without allocating another one?
                            
                                Is there a way to count with macros?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With