I'm trying to use ptrace to trace all syscalls made by a separate process, be it 32-bit (IA-32) or 64-bit (x86-64). My tracer would run on a 64-bit x86 installation with IA-32 emulation enabled, but ideally would be able to trace both 64-bit and 32-bit applications, including if a 64-bit application forks and execs a 32-bit process.
The issue is that, since 32-bit and 64-bit syscall numbers differ, I need to know whether a process is 32-bit or 64-bit to determine which syscall it used, even if I have the syscall number. There seem to be imperfect methods, like checking /proc/<pid>/exec
or (as strace does) the size of the registers struct, but nothing reliable.
Complicating this is the fact that 64-bit processes can switch out of long mode to execute 32-bit code directly. They can also make 32-bit int $0x80
syscalls, which, of course, use the 32-bit syscall numbers. I don't "trust" the processes I trace to not use these tricks, so I want to detect them correctly. And I've independently verified that in at least the latter case, ptrace sees the 32-bit syscall numbers and argument register assignments, not the 64-bit ones.
I poked around in the kernel source and came across the TS_COMPAT
flag in arch/x86/include/asm/processor.h
, which appears to be set whenever a 32-bit syscall is made by a 64-bit process. The only problem is that I have no idea how to access this flag from userland, or if it is even possible.
I also thought about reading the %cs
and comparing it to $0x23
or $0x33
, inspired by this method for switching bitness in a running process. But this only detects 32-bit processes, not necessarily 32-bit syscalls (those made with int $0x80
) from a 64-bit process. It's also fragile since it relies on undocumented kernel behavior.
Finally, I noticed that the x86 architecture has a bit for long mode in the Extended Feature Enable Register MSR. But ptrace has no way of reading the MSR from a tracee, and I feel like reading it from within my tracer will be inadequate because my tracer is always running in long mode.
I'm at a loss. Perhaps I could try and use one of those hacks—at this point I'm leaning towards %cs
or the /proc/<pid>/exec
method—but I want something durable that will actually distinguish between 32-bit and 64-bit syscalls. How can a process using ptrace under x86-64, which has detected that its tracee made a syscall, reliably determine whether that syscall was made with the 32-bit (int $0x80
) or 64-bit (syscall
) ABI? Is there some other way for a user process to gain this information about another process that it is authorized to ptrace?
Interesting, I hadn't realized that there wasn't an obvious smarter way that strace
could use to correctly decode int 0x80
from 64-bit processes. (This is being worked on, see this answer for links to a proposed kernel patch to add PTRACE_GET_SYSCALL_INFO
to the ptrace API. strace
4.26 already supports it on patched kernels.)
Update: now supports per-syscall detection IDK which mainline kernel version added the feature. I tested on Arch Linux with kernel version 5.5 and strace
version 5.5.
e.g. this NASM source assembled into a static executable:
mov eax, 4
int 0x80
mov eax, 60
syscall
gives this trace: nasm -felf64 foo.asm && ld foo.o && strace ./a.out
execve("./foo", ["./foo"], 0x7ffcdc233180 /* 51 vars */) = 0
strace: [ Process PID=1262249 runs in 32 bit mode. ]
write(0, NULL, 0) = 0
strace: [ Process PID=1262249 runs in 64 bit mode. ]
exit(0) = ?
+++ exited with 0 +++
strace
prints a message every time a system call uses a different ABI bitness than previously. Note that the message about runs in 32 bit mode is completely wrong; it's merely using the 32-bit ABI from 64-bit mode. "Mode" has a specific technical meaning for x86-64, and this is not it.
As a workaround, I think you could disassemble the code at RIP and check whether it was the syscall
instruction (0F 05
) or not, because ptrace
does let you read the target process's memory.
But for a security use-case like disallowing some system calls, this would be vulnerable to a race condition: another thread in the syscall process could rewrite the syscall
bytes to int 0x80
after they execute, but before you can peek at them with ptrace
.
You only need to do that if the process is running in 64-bit mode, otherwise only the 32-bit ABI is available. If it's not, you don't need to check. (The vdso page can potentially use 32-bit mode syscall
on AMD CPUs that support it but not sysenter
. Not checking in the first place for 32-bit processes avoids this corner case.) I think you're saying you have a reliable way to detect that at least.
(I haven't used the ptrace API directly, just the tools like strace
that use it. So I hope this answer makes sense.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With