It's the same as this one except that I'm running execl("/bin/ls", "ls", NULL);
.
The result is obviously wrong as every syscall returns with -38
:
[user@ test]# ./test_trace
syscall 59 called with rdi(0), rsi(0), rdx(0)
syscall 12 returned with -38
syscall 12 called with rdi(0), rsi(0), rdx(140737288485480)
syscall 9 returned with -38
syscall 9 called with rdi(0), rsi(4096), rdx(3)
syscall 9 returned with -38
syscall 9 called with rdi(0), rsi(4096), rdx(3)
syscall 21 returned with -38
syscall 21 called with rdi(233257948048), rsi(4), rdx(233257828696)
...
Anyone knows the reason?
UPDATE
Now the problem is :
execve called with rdi(4203214), rsi(140733315680464), rdx(140733315681192)
execve returned with 0
execve returned with 0
...
execve
returned 0
twice,why?
The ptrace() system call provides a means by which a parent process may observe and control the execution of another process, and examine and change its core image and registers. It is primarily used to implement breakpoint debugging and system call tracing.
The return value is the return value from the system call, unless the system call failed. In that case, syscall returns -1 and sets errno to an error code that the system call returned. Note that system calls do not return -1 when they succeed. If you specify an invalid sysno , syscall returns -1 with errno = ENOSYS .
ptrace is a system call found in Unix and several Unix-like operating systems. By using ptrace (the name is an abbreviation of "process trace") one process can control another, enabling the controller to inspect and manipulate the internal state of its target.
The PTRACE_SYSCALL request is used in both waiting for the next system call to begin, and waiting for that system call to exit. As before, a wait (2) is needed to wait for the tracee to enter the desired state. When wait (2) returns, the registers for the thread that made the system call are filled with the system call number and its arguments.
DESCRIPTION top The ptrace() system call provides a means by which one process (the "tracer") may observe and control the execution of another process (the "tracee"), and examine and change the tracee's memory and registers. It is primarily used to implement breakpoint debugging and system call tracing.
PTRACE_GETSIGINFO on syscall-stops returns SIGTRAP in si_signo, with si_codeset to SIGTRAP or (SIGTRAP|0x80). PTRACE_EVENT_SECCOMP stops (Linux 3.5 to 4.7)The behavior of PTRACE_EVENT_SECCOMP stops and their interaction with other kinds of ptrace stops has changed between kernel versions.
PTRACE_TRACEME: This process is to be traced by its parent. PTRACE_SYSCALL: Continue, but stop at the next system call entrance or exit.
The code doesn't account for the notification of the exec
from the child, and so ends up handling syscall entry as syscall exit, and syscall exit as syscall entry. That's why you see "syscall 12 returned
" before "syscall 12 called
", etc. (-38
is ENOSYS
which is put into RAX as a default return value by the kernel's syscall entry code.)
As the ptrace(2)
man page states:
PTRACE_TRACEME
Indicates that this process is to be traced by its parent. Any signal (except SIGKILL) delivered to this process will cause it to stop and its parent to be notified via wait(). Also, all subsequent calls to exec() by this process will cause a SIGTRAP to be sent to it, giving the parent a chance to gain control before the new program begins execution. [...]
You said that the original code you were running was "the same as this one except that I'm running execl("/bin/ls", "ls", NULL);
". Well, it clearly isn't, because you're working with x86_64 rather than 32-bit and have changed the messages at least.
But, assuming you didn't change too much else, the first time the wait()
wakes up the parent, it's not for syscall entry or exit - the parent hasn't executed ptrace(PTRACE_SYSCALL,...)
yet. Instead, you're seeing this notification that the child has performed an exec
(on x86_64, syscall 59 is execve
).
The code incorrectly interprets that as syscall entry. Then it calls ptrace(PTRACE_SYSCALL,...)
, and the next time the parent is woken it is for a syscall entry (syscall 12), but the code reports it as syscall exit.
Note that in this original case, you never see the execve
syscall entry/exit - only the additional notification - because the parent does not execute ptrace(PTRACE_SYSCALL,...)
until after it happens.
If you do arrange the code so that the execve
syscall entry/exit are caught, you will see the new behaviour that you observe. The parent will be woken three times: once for execve
syscall entry (due to use of ptrace(PTRACE_SYSCALL,...)
, once for execve
syscall exit (also due to use of ptrace(PTRACE_SYSCALL,...)
, and a third time for the exec
notification (which happens anyway).
Here is a complete example (for x86 or x86_64) which takes care to show the behaviour of the exec
itself by stopping the child first:
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <sys/ptrace.h>
#include <sys/reg.h>
#ifdef __x86_64__
#define SC_NUMBER (8 * ORIG_RAX)
#define SC_RETCODE (8 * RAX)
#else
#define SC_NUMBER (4 * ORIG_EAX)
#define SC_RETCODE (4 * EAX)
#endif
static void child(void)
{
/* Request tracing by parent: */
ptrace(PTRACE_TRACEME, 0, NULL, NULL);
/* Stop before doing anything, giving parent a chance to catch the exec: */
kill(getpid(), SIGSTOP);
/* Now exec: */
execl("/bin/ls", "ls", NULL);
}
static void parent(pid_t child_pid)
{
int status;
long sc_number, sc_retcode;
while (1)
{
/* Wait for child status to change: */
wait(&status);
if (WIFEXITED(status)) {
printf("Child exit with status %d\n", WEXITSTATUS(status));
exit(0);
}
if (WIFSIGNALED(status)) {
printf("Child exit due to signal %d\n", WTERMSIG(status));
exit(0);
}
if (!WIFSTOPPED(status)) {
printf("wait() returned unhandled status 0x%x\n", status);
exit(0);
}
if (WSTOPSIG(status) == SIGTRAP) {
/* Note that there are *three* reasons why the child might stop
* with SIGTRAP:
* 1) syscall entry
* 2) syscall exit
* 3) child calls exec
*/
sc_number = ptrace(PTRACE_PEEKUSER, child_pid, SC_NUMBER, NULL);
sc_retcode = ptrace(PTRACE_PEEKUSER, child_pid, SC_RETCODE, NULL);
printf("SIGTRAP: syscall %ld, rc = %ld\n", sc_number, sc_retcode);
} else {
printf("Child stopped due to signal %d\n", WSTOPSIG(status));
}
fflush(stdout);
/* Resume child, requesting that it stops again on syscall enter/exit
* (in addition to any other reason why it might stop):
*/
ptrace(PTRACE_SYSCALL, child_pid, NULL, NULL);
}
}
int main(void)
{
pid_t pid = fork();
if (pid == 0)
child();
else
parent(pid);
return 0;
}
which gives something like this (this is for 64-bit - system call numbers are different for 32-bit; in particular execve
is 11, rather than 59):
Child stopped due to signal 19 SIGTRAP: syscall 59, rc = -38 SIGTRAP: syscall 59, rc = 0 SIGTRAP: syscall 59, rc = 0 SIGTRAP: syscall 63, rc = -38 SIGTRAP: syscall 63, rc = 0 SIGTRAP: syscall 12, rc = -38 SIGTRAP: syscall 12, rc = 5324800 ...
Signal 19 is the explicit SIGSTOP
; the child stops three times for the execve
as just described above; then twice (entry and exit) for other system calls.
If you're really interesting in all the gory details of ptrace()
, the best documentation I'm aware of is the
README-linux-ptrace
file in the strace
source. As it says, the "API is complex and has subtle quirks"....
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With