Why does this ptrace program say syscall returned -38?

Tags:

It's the same as this one except that I'm running execl("/bin/ls", "ls", NULL);.

The result is obviously wrong as every syscall returns with -38:

[user@ test]# ./test_trace 
syscall 59 called with rdi(0), rsi(0), rdx(0)
syscall 12 returned with -38
syscall 12 called with rdi(0), rsi(0), rdx(140737288485480)
syscall 9 returned with -38
syscall 9 called with rdi(0), rsi(4096), rdx(3)
syscall 9 returned with -38
syscall 9 called with rdi(0), rsi(4096), rdx(3)
syscall 21 returned with -38
syscall 21 called with rdi(233257948048), rsi(4), rdx(233257828696)
...

Anyone knows the reason?

UPDATE

Now the problem is :

execve called with rdi(4203214), rsi(140733315680464), rdx(140733315681192)
execve returned with 0
execve returned with 0
...

execve returned 0 twice,why?

665

asked Sep 22 '11 12:09

lexer

1 Answers

The code doesn't account for the notification of the exec from the child, and so ends up handling syscall entry as syscall exit, and syscall exit as syscall entry. That's why you see "syscall 12 returned" before "syscall 12 called", etc. (-38 is ENOSYS which is put into RAX as a default return value by the kernel's syscall entry code.)

As the ptrace(2) man page states:

PTRACE_TRACEME

Indicates that this process is to be traced by its parent. Any signal (except SIGKILL) delivered to this process will cause it to stop and its parent to be notified via wait(). Also, all subsequent calls to exec() by this process will cause a SIGTRAP to be sent to it, giving the parent a chance to gain control before the new program begins execution. [...]

You said that the original code you were running was "the same as this one except that I'm running execl("/bin/ls", "ls", NULL);". Well, it clearly isn't, because you're working with x86_64 rather than 32-bit and have changed the messages at least.

But, assuming you didn't change too much else, the first time the wait() wakes up the parent, it's not for syscall entry or exit - the parent hasn't executed ptrace(PTRACE_SYSCALL,...) yet. Instead, you're seeing this notification that the child has performed an exec (on x86_64, syscall 59 is execve).

The code incorrectly interprets that as syscall entry. Then it calls ptrace(PTRACE_SYSCALL,...), and the next time the parent is woken it is for a syscall entry (syscall 12), but the code reports it as syscall exit.

Note that in this original case, you never see the execve syscall entry/exit - only the additional notification - because the parent does not execute ptrace(PTRACE_SYSCALL,...) until after it happens.

If you do arrange the code so that the execve syscall entry/exit are caught, you will see the new behaviour that you observe. The parent will be woken three times: once for execve syscall entry (due to use of ptrace(PTRACE_SYSCALL,...), once for execve syscall exit (also due to use of ptrace(PTRACE_SYSCALL,...), and a third time for the exec notification (which happens anyway).

Here is a complete example (for x86 or x86_64) which takes care to show the behaviour of the exec itself by stopping the child first:

#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <sys/ptrace.h>
#include <sys/reg.h>

#ifdef __x86_64__
#define SC_NUMBER  (8 * ORIG_RAX)
#define SC_RETCODE (8 * RAX)
#else
#define SC_NUMBER  (4 * ORIG_EAX)
#define SC_RETCODE (4 * EAX)
#endif

static void child(void)
{
    /* Request tracing by parent: */
    ptrace(PTRACE_TRACEME, 0, NULL, NULL);

    /* Stop before doing anything, giving parent a chance to catch the exec: */
    kill(getpid(), SIGSTOP);

    /* Now exec: */
    execl("/bin/ls", "ls", NULL);
}

static void parent(pid_t child_pid)
{
    int status;
    long sc_number, sc_retcode;

    while (1)
    {
        /* Wait for child status to change: */
        wait(&status);

        if (WIFEXITED(status)) {
            printf("Child exit with status %d\n", WEXITSTATUS(status));
            exit(0);
        }
        if (WIFSIGNALED(status)) {
            printf("Child exit due to signal %d\n", WTERMSIG(status));
            exit(0);
        }
        if (!WIFSTOPPED(status)) {
            printf("wait() returned unhandled status 0x%x\n", status);
            exit(0);
        }
        if (WSTOPSIG(status) == SIGTRAP) {
            /* Note that there are *three* reasons why the child might stop
             * with SIGTRAP:
             *  1) syscall entry
             *  2) syscall exit
             *  3) child calls exec
             */
            sc_number = ptrace(PTRACE_PEEKUSER, child_pid, SC_NUMBER, NULL);
            sc_retcode = ptrace(PTRACE_PEEKUSER, child_pid, SC_RETCODE, NULL);
            printf("SIGTRAP: syscall %ld, rc = %ld\n", sc_number, sc_retcode);
        } else {
            printf("Child stopped due to signal %d\n", WSTOPSIG(status));
        }
        fflush(stdout);

        /* Resume child, requesting that it stops again on syscall enter/exit
         * (in addition to any other reason why it might stop):
         */
        ptrace(PTRACE_SYSCALL, child_pid, NULL, NULL);
    }
}

int main(void)
{
    pid_t pid = fork();

    if (pid == 0)
        child();
    else
        parent(pid);

    return 0;
}

which gives something like this (this is for 64-bit - system call numbers are different for 32-bit; in particular execve is 11, rather than 59):

Child stopped due to signal 19
SIGTRAP: syscall 59, rc = -38
SIGTRAP: syscall 59, rc = 0
SIGTRAP: syscall 59, rc = 0
SIGTRAP: syscall 63, rc = -38
SIGTRAP: syscall 63, rc = 0
SIGTRAP: syscall 12, rc = -38
SIGTRAP: syscall 12, rc = 5324800
...

Signal 19 is the explicit SIGSTOP; the child stops three times for the execve as just described above; then twice (entry and exit) for other system calls.

If you're really interesting in all the gory details of ptrace(), the best documentation I'm aware of is the README-linux-ptrace file in the strace source. As it says, the "API is complex and has subtle quirks"....

answered Sep 29 '22 11:09

Matthew Slattery

Related questions
                            
                                equivalent of memcmp() in Java?
                            
                                backlog value in listen system call
                            
                                How can I debug St9bad_alloc failures in gdb in C?
                            
                                C Puzzle - play with types
                            
                                OS independent clipboard copy/paste text in C
                            
                                What does this macro define?
                            
                                Using C, why would a char * type be of size 2 in one place, but 4 in another?
                            
                                'dxerr9.h': No such file or directory
                            
                                CPU Cycle count based profiling in C/C++ Linux x86_64
                            
                                Permutation generator on C
                            
                                glib memory allocation VS std *alloc and free
                            
                                Opengl drawing a 2d overlay on a 3d scene problem
                            
                                dlopen and global variables in C/C++
                            
                                NASM Guessing Number Game Gone Wrong
                            
                                Rationale for system calls that allow request of size_t but result of only ssize_t?
                            
                                Why Pointers to Undefined Structs are Sometimes Illegal in C and C++
                            
                                bison/flex: print erroneous line
                            
                                How do you use offsetof() on a struct?
                            
                                How to elegantly implement a series of functions in different type versions using pure C?
                            
                                Why can't I omit the dimensions altogether when initializing a multi-dimensional array?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why does this ptrace program say syscall returned -38?

Tags:

c

linux

ptrace

lexer

People also ask

1 Answers

Matthew Slattery

Recent Activity

Donate For Us