Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does this function call behave sensibly after calling it through a typecasted function pointer?

I have the following code. There is a function that takes two int32. Then I take a pointer to it and cast it to a function that takes three int8 and call it. I expected a runtime error but program works fine. Why this even possible?

main.cpp:

#include <iostream>

using namespace std;

void f(int32_t a, int32_t b) {
    cout << a << " " << b << endl;
}

int main() {
    cout << typeid(&f).name() << endl;
    auto g = reinterpret_cast<void(*)(int8_t, int8_t, int8_t)>(&f);
    cout << typeid(g).name() << endl;
    g(10, 20, 30);
    return 0;
}

Output:

PFviiE
PFvaaaE
10 20

As I can see the signature of the first function requires two ints and the second function requires three chars. Char is smaller than int and I wondered why a and b are still equals to 10 and 20.

like image 472
Divano Avatar asked Jun 23 '19 19:06

Divano


3 Answers

As others have pointed out, this is undefined behavior, so all bets are off about what may in principle happen. But assuming that you're on an x86 machine, there's a plausible explanation as to why you're seeing this.

On x86, the g++ compiler doesn't always pass arguments by pushing them onto the stack. Instead, it stashes the first few arguments into registers. If we disassemble the f function, notice that the first few instructions move the arguments out of registers and explicitly onto the stack:

    push    rbp
    mov     rbp, rsp
    sub     rsp, 16
    mov     DWORD PTR [rbp-4], edi  # <--- Here
    mov     DWORD PTR [rbp-8], esi  # <--- Here
    # (many lines skipped)

Similarly, notice how the call is generated in main. The arguments are placed into those registers:

    mov     rax, QWORD PTR [rbp-8]
    mov     edx, 30      # <--- Here
    mov     esi, 20      # <--- Here
    mov     edi, 10      # <--- Here
    call    rax

Since the entire register is being used to hold the arguments, the size of the arguments isn't relevant here.

Moreover, because these arguments are being passed via registers, there's no concern about resizing the stack in an incorrect way. Some calling conventions (cdecl) leave the caller to do cleanup, while others (stdcall) ask the callee to do cleanup. However, neither really matters here, because the stack isn't touched.

like image 70
templatetypedef Avatar answered Nov 15 '22 07:11

templatetypedef


As others have pointed out, it's probably undefined behavior, but old school C programmers know this type of thing to work.

Also, because I can sense the language lawyers drafting their litigation documents and court petitions for what I'm about to say, I'm going to cast a spell of undefined behavior discussion. It's cast by saying undefined behavior three times while tapping my shoes together. And that makes the language lawyers disappear so I can explain why weird things just happen to work without getting sued.

Back to my answer:

Everything I discuss below is compiler specific behavior. All of my simulations are with Visual Studio compiled as 32-bit x86 code. I suspect it will work the same with gcc and g++ on a similar 32-bit architecture.

Here's why your code just happens to work and some caveats.

  1. When function call arguments get pushed onto the stack, they get pushed in reverse order. When f is invoked normally, the compiler generates code to push the b argument onto the stack before the a argument. This helps facilitate variadic argument functions such as printf. So when your function, f is accessing a and b, it's just accessing arguments at the top of the stack. When invoked through g, there was an extra argument pushed to the stack (30), but it got pushed first. 20 was pushed next, followed by 10 which is at the top of the stack. f is only looking at the top two arguments on the stack.

  2. IIRC, at least in classic ANSI C, chars and shorts, always get promoted to int before being placed on the stack. That's why, when you invoked it with g, the literals 10 and 20 get placed on the stack as full sized ints instead of 8-bit ints. However, the moment you redefine f to take 64-bit longs instead of 32-bit ints, the output of your program changes.

    void  f(int64_t a, int64_t b) {
        cout << a << " " << b << endl;
    }

Results in this getting output by your main (with my compiler)

85899345930 48435561672736798

And if you convert to hex:

140000000a effaf00000001e

14 is 20 and 0A is 10. And I suspect that 1e is your 30 getting pushed to the stack. So the arguments got pushed to the stack when invoked through g, but were munged up in some compiler specific way. (undefined behavior again, but you can see the arguments got pushed).

  1. When you invoke a function, the usual behavior is that the calling code will fix up the stack pointer upon return from a called function. Again, this is for the sake of variadic functions and other legacy reasons for compat with K&R C. printf has no idea how many arguments you actually passed to it, and it relies on the caller to fix the stack when it returns. So when you invoked through g, the compiler generated code to push 3 integers to the stack, invoke the function, and then code to pop those same values off. The moment, you change your compiler option to have the callee clean up the stack (ala __stdcall on Visual Studio):
    void  __stdcall f(int32_t a, int32_t b) {
        cout << a << " " << b << endl;
    }

Now you are clearly in undefined behavior territory. Invoking through g pushed three int arguments onto the stack, but the compiler only generated code for f to pop two int arguments off the stack when it returns. The stack pointer is corrupted upon return.

like image 9
selbie Avatar answered Nov 15 '22 07:11

selbie


As other have pointed out, it is entirely undefined behaviour, and what you get will depend on the compiler. It will work only if you have a specific call convention, that doesn't use the stack but registers to pass the parameters.

I used Godbolt to see the assembly generated, that you can check in full here

The relevant function call is here:

mov     edi, 10
mov     esi, 20
mov     edx, 30
call    f(int, int) #clang totally knows you're calling f by the way

It doesn't push parameters on the stack, it simply puts them in registers. What is most interesting is that the mov instruction doesn't change just the lower 8 bits of the register, but all of them as it is a 32-bit move. This also means that no matter what was in the register before, you will always get the right value when you read 32 bits back as f does.

If you wonder why the 32-bit move, it turns out that in almost every case, on a x86 or AMD64 architecture, compilers will always use either 32 bit literal moves or 64 bit literal moves (if and only if the value is too big for 32 bits). Moving a 8 bit value doesn't zero out the upper bits (8-31) of the register, and it can create problems if the value would end up being promoted. Using a 32-bit literal instruction is more simple than having one additional instruction to zero out the register first.

One thing you have to remember though is it is really trying to call f as if it had 8 bits parameters, so if you put a large value it will truncate the literal. For example, 1000 will become -24, as the lower bits of 1000 are E8, which is -24 when using signed integers. You will also get a warning

<source>:13:7: warning: implicit conversion from 'int' to 'signed char' changes value from 1000 to -24 [-Wconstant-conversion]
like image 1
meneldal Avatar answered Nov 15 '22 06:11

meneldal