I was experimenting with function pointer magic in Rust and ended up with a code snippet which I have absolutely no explanation for why it compiles and even more, why it runs.
fn foo() {
println!("This is really weird...");
}
fn caller<F>() where F: FnMut() {
let closure_ptr = 0 as *mut F;
let closure = unsafe { &mut *closure_ptr };
closure();
}
fn create<F>(_: F) where F: FnMut() {
caller::<F>();
}
fn main() {
create(foo);
create(|| println!("Okay..."));
let val = 42;
create(|| println!("This will seg fault: {}", val));
}
I cannot explain why foo
is being invoked by casting a null pointer in caller(...)
to an instance of type F
. I would have thought that functions may only be called through corresponding function pointers, but that clearly can't be the case given that the pointer itself is null. With that being said, it seems that I clearly misunderstand an important piece of Rust's type system.
Example on Playground
Rust proves that you are not using invalid pointers. Rust proves that you never have null-pointers. Rust proves that you never have two pointers to the same object, execept if all these pointers are non-mutable (const).
Note: NULL is a value that is currently invalid or absent. Although, Rust does not have NULL, but provides an enum Option which can encode the concept of a value being present or absent. Here, is a part of Rust Generics, whatever the datatype we will mention in T, Some(T) will have the value of the same datatype.
A null pointer is a pointer which points nothing. Some uses of the null pointer are: a) To initialize a pointer variable when that pointer variable isn't assigned any valid memory address yet. b) To pass a null pointer to a function argument when we don't want to pass any valid memory address.
Initializing a pointer to a function, or a pointer in general to NULL helps some developers to make sure their pointer is uninitialized and not equal to a random value, thereby preventing them of dereferencing it by accident.
See also the std::ptr module. Working with raw pointers in Rust is uncommon, typically limited to a few patterns. Raw pointers can be unaligned or null. However, when a raw pointer is dereferenced (using the * operator), it must be non-null and aligned.
Using a managed pointer will activate Rust’s garbage collection mechanism. Borrowed: You’re writing a function, and you need a pointer, but you don’t care about its ownership. If you make the argument a borrowed pointer, callers can send in whatever kind they want. Five exceptions. That’s it. Otherwise, you shouldn’t need them.
If you’re coming to Rust from a language like C or C++, you may be used to passing things by reference, or passing things by pointer. In some langauges, like Java, you can’t even have objects without a pointer to them.
When this function is used during const evaluation, it may return false for pointers that turn out to be null at runtime. Specifically, when a pointer to some memory is offset beyond its bounds in such a way that the resulting pointer is null, the function will still return false.
This program never actually constructs a function pointer at all- it always invokes foo
and those two closures directly.
Every Rust function, whether it's a closure or a fn
item, has a unique, anonymous type. This type implements the Fn
/FnMut
/FnOnce
traits, as appropriate. The anonymous type of a fn
item is zero-sized, just like the type of a closure with no captures.
Thus, the expression create(foo)
instantiates create
's parameter F
with foo
's type- this is not the function pointer type fn()
, but an anonymous, zero-sized type just for foo
. In error messages, rustc calls this type fn() {foo}
, as you can see this error message.
Inside create::<fn() {foo}>
(using the name from the error message), the expression caller::<F>()
forwards this type to caller
without giving it a value of that type.
Finally, in caller::<fn() {foo}>
the expression closure()
desugars to FnMut::call_mut(closure)
. Because closure
has type &mut F
where F
is just the zero-sized type fn() {foo}
, the 0
value of closure
itself is simply never used1, and the program calls foo
directly.
The same logic applies to the closure || println!("Okay...")
, which like foo
has an anonymous zero-sized type, this time called something like [closure@src/main.rs:2:14: 2:36]
.
The second closure is not so lucky- its type is not zero-sized, because it must contain a reference to the variable val
. This time, FnMut::call_mut(closure)
actually needs to dereference closure
to do its job. So it crashes2.
1 Constructing a null reference like this is technically undefined behavior, so the compiler makes no promises about this program's overall behavior. However, replacing 0
with some other "address" with the alignment of F
avoids that problem for zero-sized types like fn() {foo}
, and gives the same behavior!)
2 Again, constructing a null (or dangling) reference is the operation that actually takes the blame here- after that, anything goes. A segfault is just one possibility- a future version of rustc, or the same version when run on a slightly different program, might do something else entirely!
The type of fn foo() {...}
is not a function pointer fn()
, it's actually a unique type specific to foo
. As long as you carry that type along (here as F
), the compiler knows how to call it without needing any extra pointers (a value of such a type carries no data). A closure that doesn't capture anything works the same way. It only gets dicey when the last closure tries to look up val
because you put a 0
where (presumably) the pointer to val
was supposed to be.
You can observe this with size_of
, in the first two calls, the size of closure
is zero, but in the last call with something captured in the closure, the size is 8 (at least on the playground). If the size is 0, the program doesn't have to load anything from the NULL
pointer.
The effective cast of a NULL
pointer to a reference is still undefined behavior, but because of type shenanigans and not because of memory access shenanigans: having references that are really NULL
is in itself illegal, because memory layout of types like Option<&T>
relies on the assumption that the value of a reference is never NULL
. Here's an example of how it can go wrong:
unsafe fn null<T>(_: T) -> &'static mut T {
&mut *(0 as *mut T)
}
fn foo() {
println!("Hello, world!");
}
fn main() {
unsafe {
let x = null(foo);
x(); // prints "Hello, world!"
let y = Some(x);
println!("{:?}", y.is_some()); // prints "false", y is None!
}
}
Given that rust is built on top of LLVM, and that what you're doing is guaranteed UB, you're likely hitting something similar to https://kristerw.blogspot.com/2017/09/why-undefined-behavior-may-call-never.html. This is one of many reasons why safe rust works to eliminate all UB.
Although this is entirely up to UB, here's what I assume might be happening in the two cases:
The type F
is a closure with no data. This is equivalent to a function, which means that F
is a function item. What this means is that the compiler can optimize any call to an F
into a call to whatever function produced F
(without ever making a function pointer). See this for an example of the different names for these things.
The compiler sees that val
is always 42, and hence it can optimize it into a constant. If that's the case, then the closure passed into create
is again a closure with no captured items, and hence we can follow the ideas in #1.
Additionally, I say this is UB, however please note something critical about UB: If you invoke UB and the compiler takes advantage of it in an unexpected way, it is not trying to mess you up, it is trying to optimize your code. UB after all, is about the compiler mis-optimizing things because you've broken some expectations it has. It is hence, completely logical that the compiler optimizes this way. It would also be completely logical that the compiler doesn't optimize this way and instead takes advantage of the UB.
This is "working" because fn() {foo}
and the first closure are zero-sized types. Extended answer:
If this program ends up executed in Miri (Undefined behaviour checker), it ends up failing because NULL pointer is dereferenced. NULL pointer cannot ever be dereferenced, even for zero-sized types. However, undefined behaviour can do anything, so compiler makes no promises about the behavior, and this means it can break in the future release of Rust.
error: Undefined Behavior: memory access failed: 0x0 is not a valid pointer
--> src/main.rs:7:28
|
7 | let closure = unsafe { &mut *closure_ptr };
| ^^^^^^^^^^^^^^^^^ memory access failed: 0x0 is not a valid pointer
|
= help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
= help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
= note: inside `caller::<fn() {foo}>` at src/main.rs:7:28
note: inside `create::<fn() {foo}>` at src/main.rs:13:5
--> src/main.rs:13:5
|
13 | func_ptr();
| ^^^^^^^^^^
note: inside `main` at src/main.rs:17:5
--> src/main.rs:17:5
|
17 | create(foo);
| ^^^^^^^^^^^
This issue can be easily fixed by writing let closure_ptr = 1 as *mut F;
, then it will only fail on line 22 with the second closure that will segfault.
error: Undefined Behavior: inbounds test failed: 0x1 is not a valid pointer
--> src/main.rs:7:28
|
7 | let closure = unsafe { &mut *closure_ptr };
| ^^^^^^^^^^^^^^^^^ inbounds test failed: 0x1 is not a valid pointer
|
= help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
= help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
= note: inside `caller::<[closure@src/main.rs:22:12: 22:55 val:&i32]>` at src/main.rs:7:28
note: inside `create::<[closure@src/main.rs:22:12: 22:55 val:&i32]>` at src/main.rs:13:5
--> src/main.rs:13:5
|
13 | func_ptr();
| ^^^^^^^^^^
note: inside `main` at src/main.rs:22:5
--> src/main.rs:22:5
|
22 | create(|| println!("This will seg fault: {}", val));
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Why it didn't complain about foo
or || println!("Okay...")
? Well, because they don't store any data. When referring to a function, you don't get a function pointer but rather a zero-sized type representing that specific function - this helps with monomorphization, as each function is distinct. A structure not storing any data can be created from aligned dangling pointer.
However, if you explicitly say the function is a function pointer by saying create::<fn()>(foo)
then the program will stop working.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With