Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to pass structs to C function from x86-64 assembly on Mac (NASM)

From here:

nanosleep((const struct timespec[]){{0, 500000000L}}, NULL);

It passes a struct. I am not sure how to pass structs to the syscall or library functions via registers. Wondering if one can show a hello world example of NASM assembly passing a struct to this syscall.

In addition, if I wrap this function in C, it is no longer a syscall. I would like to know how to write the assembly so it can work in that case preferrably. So basically, how to build up a struct in assembly and pass it to a C function in x86-64 on Mac. There are many C library functions that take structs, so am interested to see how to generically pass a struct to them.

like image 838
Lance Avatar asked Mar 23 '19 00:03

Lance


2 Answers

IIRC in x86_64 System V ABI small structures such as this are just "exploded" on the regular arguments registers; but this isn't the case - nanosleep takes a pointer to that structure (and in general, I don't even think the syscall calling convention allows to pass structures by value).

IOW, that code is pretty much equivalent to:

struct timespec ts{0, 500000000L};
nanosleep(&ts, NULL);

So, you'll have to carve out 16 bytes of stack space for ts and fill it in (you may even get away with two push), get a pointer to it (you may need lea) and pass the result as the first parameter to nanosleep (so, in rdi, with 0 in rsi).

On Linux that would be something like:

push 500000000  ; push last 64 bit of ts
push 0          ; push first 64 bit of ts
mov rdi,rsp     ; the stack pointer now points to ts; use it as first arg
xor esi,esi     ; the second arg is NULL
mov eax,35      ; syscall 35 -> nanosleep
syscall
add rsp,16      ; restore the stack

on macOS AFAIK it should be the same, the only difference should be the syscall number.

like image 176
Matteo Italia Avatar answered Oct 08 '22 14:10

Matteo Italia


If you'd compiled this with a C compiler and looked at the asm output, you'd have have seen that it just passes a pointer to the struct.

The C is creating an anonymous array of struct timespec[], which is an lvalue and thus it's legal for it to "decay" to a pointer when passed to
int nanosleep(const struct timespec *req, struct timespec *rem);

If you look up the system call's man page, you'll see that it takes both args as pointers.

In fact there are no POSIX system calls that take struct args by value. This design choice makes sense because not all calling conventions across all architectures handle passing structs the same way. System-call calling conventions often don't match function-call calling conventions exactly, and typically don't have rules for anything other than integer/pointer types.

System calls are usually limited to 6 args max, with no fallback to stack memory for large args. The kernel needs a generic mechanism to collect the args from user-space and dispatch them to a kernel function from a table of function pointers, so all system calls need to have signatures that are compatible with syscall(uintptr_t a, uintptr_t b, ... uintptr_t f) at an asm level.

If an OS introduced a system call that took a struct by value, it would have to define the ABI details of passing it on every architecture it supported. This could get tricky, e.g. a 16-byte structure like struct timespec on a 32-bit architecture would take up 4 register-width arg-passing slots. (Assuming times are still 64-bit, otherwise you have the year-2038 rollover problem.)


As Matteo says, x86-64 System V packs structs up to 16 bytes into up to 2 registers for calling functions. The rules are well documented in the ABI, but it's usually easiest to write a simple test function that stores its args to volatile long x or returns one of them, and compile it with optimization enabled.

e.g. on Godbolt

#include <stdint.h>
struct padded {
    int16_t a;
    int64_t b;
};
int64_t ret_a(int dummy, padded s) { return s.a;  }
int64_t ret_b(int dummy, padded s) { return s.b; }

Compiles for x86-64 System V to this asm, so we can see that the struct is passed in RDX:RSI with the upper 6 bytes of RSI unused (potentially holding garbage), just like the object representation in memory with 6 bytes of padding so the int64_t member has alignof(int64_t) = 8 alignment.

ret_a(int, padded):
        movsx   rax, si
        ret
ret_b(int, padded):
        mov     rax, rdx
        ret

Writing a caller that puts args in the right registers should be obvious.

like image 39
Peter Cordes Avatar answered Oct 08 '22 12:10

Peter Cordes