I have the following code. There is a function that takes two int32. Then I take a pointer to it and cast it to a function that takes three int8 and call it. I expected a runtime error but program works fine. Why this even possible?
main.cpp:
#include <iostream>
using namespace std;
void f(int32_t a, int32_t b) {
cout << a << " " << b << endl;
}
int main() {
cout << typeid(&f).name() << endl;
auto g = reinterpret_cast<void(*)(int8_t, int8_t, int8_t)>(&f);
cout << typeid(g).name() << endl;
g(10, 20, 30);
return 0;
}
Output:
PFviiE
PFvaaaE
10 20
As I can see the signature of the first function requires two ints and the second function requires three chars. Char is smaller than int and I wondered why a and b are still equals to 10 and 20.
As others have pointed out, this is undefined behavior, so all bets are off about what may in principle happen. But assuming that you're on an x86 machine, there's a plausible explanation as to why you're seeing this.
On x86, the g++ compiler doesn't always pass arguments by pushing them onto the stack. Instead, it stashes the first few arguments into registers. If we disassemble the f
function, notice that the first few instructions move the arguments out of registers and explicitly onto the stack:
push rbp
mov rbp, rsp
sub rsp, 16
mov DWORD PTR [rbp-4], edi # <--- Here
mov DWORD PTR [rbp-8], esi # <--- Here
# (many lines skipped)
Similarly, notice how the call is generated in main
. The arguments are placed into those registers:
mov rax, QWORD PTR [rbp-8]
mov edx, 30 # <--- Here
mov esi, 20 # <--- Here
mov edi, 10 # <--- Here
call rax
Since the entire register is being used to hold the arguments, the size of the arguments isn't relevant here.
Moreover, because these arguments are being passed via registers, there's no concern about resizing the stack in an incorrect way. Some calling conventions (cdecl
) leave the caller to do cleanup, while others (stdcall
) ask the callee to do cleanup. However, neither really matters here, because the stack isn't touched.
As others have pointed out, it's probably undefined behavior, but old school C programmers know this type of thing to work.
Also, because I can sense the language lawyers drafting their litigation documents and court petitions for what I'm about to say, I'm going to cast a spell of undefined behavior discussion
. It's cast by saying undefined behavior
three times while tapping my shoes together. And that makes the language lawyers disappear so I can explain why weird things just happen to work without getting sued.
Back to my answer:
Everything I discuss below is compiler specific behavior. All of my simulations are with Visual Studio compiled as 32-bit x86 code. I suspect it will work the same with gcc and g++ on a similar 32-bit architecture.
Here's why your code just happens to work and some caveats.
When function call arguments get pushed onto the stack, they get pushed in reverse order. When f
is invoked normally, the compiler generates code to push the b
argument onto the stack before the a
argument. This helps facilitate variadic argument functions such as printf. So when your function, f
is accessing a
and b
, it's just accessing arguments at the top of the stack. When invoked through g
, there was an extra argument pushed to the stack (30), but it got pushed first. 20 was pushed next, followed by 10 which is at the top of the stack. f
is only looking at the top two arguments on the stack.
IIRC, at least in classic ANSI C, chars and shorts, always get promoted to int before being placed on the stack. That's why, when you invoked it with g
, the literals 10 and 20 get placed on the stack as full sized ints instead of 8-bit ints. However, the moment you redefine f
to take 64-bit longs instead of 32-bit ints, the output of your program changes.
void f(int64_t a, int64_t b) {
cout << a << " " << b << endl;
}
Results in this getting output by your main (with my compiler)
85899345930 48435561672736798
And if you convert to hex:
140000000a effaf00000001e
14
is 20
and 0A
is 10
. And I suspect that 1e
is your 30
getting pushed to the stack. So the arguments got pushed to the stack when invoked through g
, but were munged up in some compiler specific way. (undefined behavior again, but you can see the arguments got pushed).
printf
has no idea how many arguments you actually passed to it, and it relies on the caller to fix the stack when it returns. So when you invoked through g
, the compiler generated code to push 3 integers to the stack, invoke the function, and then code to pop those same values off. The moment, you change your compiler option to have the callee clean up the stack (ala __stdcall
on Visual Studio): void __stdcall f(int32_t a, int32_t b) {
cout << a << " " << b << endl;
}
Now you are clearly in undefined behavior territory. Invoking through g
pushed three int arguments onto the stack, but the compiler only generated code for f
to pop two int arguments off the stack when it returns. The stack pointer is corrupted upon return.
As other have pointed out, it is entirely undefined behaviour, and what you get will depend on the compiler. It will work only if you have a specific call convention, that doesn't use the stack but registers to pass the parameters.
I used Godbolt to see the assembly generated, that you can check in full here
The relevant function call is here:
mov edi, 10
mov esi, 20
mov edx, 30
call f(int, int) #clang totally knows you're calling f by the way
It doesn't push parameters on the stack, it simply puts them in registers. What is most interesting is that the mov
instruction doesn't change just the lower 8 bits of the register, but all of them as it is a 32-bit move. This also means that no matter what was in the register before, you will always get the right value when you read 32 bits back as f does.
If you wonder why the 32-bit move, it turns out that in almost every case, on a x86 or AMD64 architecture, compilers will always use either 32 bit literal moves or 64 bit literal moves (if and only if the value is too big for 32 bits). Moving a 8 bit value doesn't zero out the upper bits (8-31) of the register, and it can create problems if the value would end up being promoted. Using a 32-bit literal instruction is more simple than having one additional instruction to zero out the register first.
One thing you have to remember though is it is really trying to call f
as if it had 8 bits parameters, so if you put a large value it will truncate the literal. For example, 1000
will become -24
, as the lower bits of 1000
are E8
, which is -24
when using signed integers. You will also get a warning
<source>:13:7: warning: implicit conversion from 'int' to 'signed char' changes value from 1000 to -24 [-Wconstant-conversion]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With