Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What prevents the usage of a function argument as hidden pointer?

I try to understand the implication of System V AMD64 - ABI's calling convention and looking at the following example:

struct Vec3{
    double x, y, z;
};

struct Vec3 do_something(void);

void use(struct Vec3 * out){
    *out = do_something();
}

A Vec3-variable is of type MEMORY and thus the caller (use) must allocate space for the returned variable and pass it as hidden pointer to the callee (i.e. do_something). Which is what we see in the resulting assembler (on godbolt, compiled with -O2):

use:
        pushq   %rbx
        movq    %rdi, %rbx           ;remember out
        subq    $32, %rsp            ;memory for returned object
        movq    %rsp, %rdi           ;hidden pointer to %rdi
        call    do_something
        movdqu  (%rsp), %xmm0        ;copy memory to out
        movq    16(%rsp), %rax
        movups  %xmm0, (%rbx)
        movq    %rax, 16(%rbx)
        addq    $32, %rsp            ;unwind/restore
        popq    %rbx
        ret

I understand, that an alias of pointer out (e.g. as global variable) could be used in do_something and thus out cannot be passed as hidden pointer to do_something: if it would, out would be changed inside of do_something and not when do_something returns, thus some calculations might become faulty. For example this version of do_something would return faulty results:

struct Vec3 global; //initialized somewhere
struct Vec3 do_something(void){
   struct Vec3 res;
   res.x = 2*global.x; 
   res.y = global.y+global.x; 
   res.z = 0; 
   return res;
}

if out where an alias for the global variable global and were used as hidden pointer passed in %rdi, res were also an alias of global, because the compiler would use the memory pointed to by hidden pointer directly (a kind of RVO in C), without actually creating a temporary object and copying it when returned, then res.y would be 2*x+y(if x,y are old values of global) and not x+y as for any other hidden pointer.

It was suggested to me, that using restrict should solve the problem, i.e.

void use(struct Vec3 *restrict out){
    *out = do_something();
}

because now, the compiler knows, that there are no aliases of out which could be used in do_something, so the assembler could be as simple as this:

use:
    jmp     do_something ; %rdi is now the hidden pointer

However, this is not the case neither for gcc nor for clang - the assembler stays unchanged (see on godbolt).

What prevents the usage of out as hidden pointer?


NB: The desired (or very similar) behavior would be achieved for a slightly different function-signature:

struct Vec3 use_v2(){
    return do_something();
}

which results in (see on godbolt):

use_v2:
    pushq   %r12
    movq    %rdi, %r12
    call    do_something
    movq    %r12, %rax
    popq    %r12
    ret
like image 917
ead Avatar asked Aug 06 '19 13:08

ead


2 Answers

A function is allowed to assume its return-value object (pointed-to by a hidden pointer) is not the same object as anything else. i.e. that its output pointer (passed as a hidden first arg) doesn't alias anything.

You could think of this as the hidden first arg output pointer having an implicit restrict on it. (Because in the C abstract machine, the return value is a separate object, and the x86-64 System V specifies that the caller provides space. x86-64 SysV doesn't give the caller license to introduce aliasing.)

Using an otherwise-private local as the destination (instead of separate dedicated space and then copying to a real local) is fine, but pointers that may point to something reachable another way must not be used. This requires escape analysis to make sure that a pointer to such a local hasn't been passed outside of the function.

I think the x86-64 SysV calling convention models the C abstract machine here by having the caller provide a real return-value object, not forcing the callee to invent that temporary if needed to make sure all the writes to the retval happened after any other writes. That's not what "the caller provides space for the return value" means, IMO.

That's definitely how GCC and other compilers interpret it in practice, which is a big part of what matters in a calling convention that's been around this long (since a year or two before the first AMD64 silicon, so very early 2000s).


Here's a case where your optimization would break if it were done:

struct Vec3{
    double x, y, z;
};
struct Vec3 glob3;

__attribute__((noinline))
struct Vec3 do_something(void) {  // copy glob3 to retval in some order
    return (struct Vec3){glob3.y, glob3.z, glob3.x};
}

__attribute__((noinline))
void use(struct Vec3 * out){   // copy do_something() result to *out
    *out = do_something();
}


void caller(void) {
    use(&glob3);
}

With the optimization you're suggesting, do_something's output object would be glob3. But it also reads glob3.

A valid implementation for do_something would be to copy elements from glob3 to (%rdi) in source order, which would do glob3.x = glob3.y before reading glob3.x as the 3rd element of the return value.

That is in fact exactly what gcc -O1 does (Godbolt compiler explorer)

do_something:
    movq    %rdi, %rax               # tmp90, .result_ptr
    movsd   glob3+8(%rip), %xmm0      # glob3.y, glob3.y
    movsd   %xmm0, (%rdi)             # glob3.y, <retval>.x
    movsd   glob3+16(%rip), %xmm0     # glob3.z, _2
    movsd   %xmm0, 8(%rdi)            # _2, <retval>.y
    movsd   glob3(%rip), %xmm0        # glob3.x, _3
    movsd   %xmm0, 16(%rdi)           # _3, <retval>.z
    ret     

Notice the glob3.y, <retval>.x store before the load of glob3.x.

So without restrict anywhere in the source, GCC already emits asm for do_something that assumes no aliasing between the retval and glob3.


I don't think using struct Vec3 *restrict out wouldn't help at all: that only tells the compiler that inside use() you won't access the *out object through any other name. Since use() doesn't reference glob3, it's not UB to pass &glob3 as an arg to a restrict version of use.

I may be wrong here; @M.M argues in comments that *restrict out might make this optimization safe because the execution of do_something() happens during out(). (Compilers still don't actually do it, but maybe they would be allowed to for restrict pointers.)

Update: Richard Biener said in the GCC missed-optimization bug-report that M.M is correct, and if the compiler can prove that the function returns normally (not exception or longjmp), the optimization is legal in theory (but still not something GCC is likely to look for):

If so, restrict would make this optimization safe if we can prove that do_something is "noexcept" and doesn't longjmp.

Yes.

There's a noexecpt declaration, but there isn't (AFAIK) a nolongjmp declaration you can put on a prototype.

So that means it's only possible (even in theory) as an inter-procedural optimization when we can see the other function's body. Unless noexcept also means no longjmp.

like image 173
Peter Cordes Avatar answered Oct 21 '22 21:10

Peter Cordes


The answers of @JohnBollinger and @PeterCordes cleared a lot of things for me, but I decided to bug gcc-developers. Here is how I understand their answer.

As @PeterCordes has pointed out, the callee assumes, that the hidden pointer is restrict. However it makes also another (less obvious) assumption: the memory to which the hidden pointer points is uninitialized.

Why this is important, is probably simpler to see with the help of a C++-example:

struct Vec3 do_something(void){
   struct Vec3 res;
   res.x = 0.0; 
   res.y = func_which_throws(); 
   res.z = 0.0; 
   return res;
}

do_something writes directly to the memory pointed to by %rdi (as shown in the multiple listings in this Q&A), and it is allowed do so, only because this memory is uninitialized: if func_which_throws() throws and the exception is caught somewhere, then nobody will know, that we have changed only the x-component ot the result, because nobody knows which original value it had prior to be passed to do_something (nobody could have read the original value, because it would be UB).

The above would break for passing out-pointer as hidden pointer, because it could be observed, that only a part and not the whole memory was changed in case of an exception being thrown and caught.

Now, C has something similar to C++'s exceptions: setjmp and longjmp. Never heard of them before, but it looks like in comparison to C++-example setjmp is best described as try ... catch ... and longjmp as throw.

This means, that also for C we must ensure, that the space provided by the caller is uninitialized.

Even without setjmp/longjmp there are some other issues, among others: interoperability with C++-code, which has exceptions, and -fexceptions option of gcc-compiler.


Corollary: The desired optimization would be possible if we had a qualifer for unitialized memory (which we don't have), e.g. uninit, then

void use(struct Vec3 *restrict uninit out);

would do the trick.

like image 41
ead Avatar answered Oct 21 '22 23:10

ead