The ptrace
system call allows the parent process to inspect the attached child. For example, in Linux, strace
(which is implemented with the ptrace
system call) can inspect the system calls invoked by the child process.
When the attached child process invokes a system call, the ptracing parent process can be notified. But how exactly does that happen? I want to know the technical details behind this mechanism.
Thank you in advance.
ptrace provides a mechanism by which a parent process may observe and control the execution of another process. It can examine and change its core image and registers and is used primarily to implement breakpoint debugging and system call tracing.
Uses. ptrace is used by debuggers (such as gdb and dbx), by tracing tools like strace and ltrace, and by code coverage tools. ptrace is also used by specialized programs to patch running programs, to avoid unfixed bugs or to overcome security features.
Finally, ptrace() is hard to implement correctly and consistently. As a result, there has been a long history of obnoxious bugs associated with it, and user-space code which uses ptrace() tends to become encrusted with non-portable workarounds.
The ptrace() system call provides a means by which a parent process may observe and control the execution of another process, and examine and change its core image and registers. It is primarily used to implement breakpoint debugging and system call tracing.
When the attached child process invokes a system call, the ptracing parent process can be notified. But how exactly does that happen?
Parent process calls ptrace
with PTRACE_ATTACH
, and his child calls ptrace
with PTRACE_TRACEME
option. This pair will connect two processes by filling some fields inside their task_struct
(kernel/ptrace.c: sys_ptrace, child will have PT_PTRACED
flag in ptrace
field of struct task_struct
, and pid of ptracer process as parent and in ptrace_entry
list - __ptrace_link
; parent will record child's pid in ptraced
list).
Then strace will call ptrace
with PTRACE_SYSCALL
flag to register itself as syscall debugger, setting thread_flag TIF_SYSCALL_TRACE
in child process's struct thread_info
(by something like set_tsk_thread_flag(child, TIF_SYSCALL_TRACE);
). arch/x86/include/asm/thread_info.h
:
67 /* 68 * thread information flags 69 * - these are process state flags that various assembly files 70 * may need to access ...*/ 75 #define TIF_SYSCALL_TRACE 0 /* syscall trace active */ 99 #define _TIF_SYSCALL_TRACE (1 << TIF_SYSCALL_TRACE)
On every syscall entry or exit, architecture-specific syscall entry code will check this _TIF_SYSCALL_TRACE
flag (directly in assembler implementation of syscall, for example x86 arch/x86/kernel/entry_32.S
: jnz syscall_trace_entry
in ENTRY(system_call)
and similar code in syscall_exit_work
), and if it is set, ptracer will be notified with signal (SIGTRAP) and child will be temporary stopped. This is done usually in syscall_trace_enter
and syscall_trace_leave
:
1457 long syscall_trace_enter(struct pt_regs *regs) 1483 if ((ret || test_thread_flag(TIF_SYSCALL_TRACE)) && 1484 tracehook_report_syscall_entry(regs)) 1485 ret = -1L; 1507 void syscall_trace_leave(struct pt_regs *regs) 1531 if (step || test_thread_flag(TIF_SYSCALL_TRACE)) 1532 tracehook_report_syscall_exit(regs, step);
The tracehook_report_syscall_*
are actual workers here, they will call ptrace_report_syscall
. include/linux/tracehook.h
:
80 /** 81 * tracehook_report_syscall_entry - task is about to attempt a system call 82 * @regs: user register state of current task 83 * 84 * This will be called if %TIF_SYSCALL_TRACE has been set, when the 85 * current task has just entered the kernel for a system call. 86 * Full user register state is available here. Changing the values 87 * in @regs can affect the system call number and arguments to be tried. 88 * It is safe to block here, preventing the system call from beginning. 89 * 90 * Returns zero normally, or nonzero if the calling arch code should abort 91 * the system call. That must prevent normal entry so no system call is 92 * made. If @task ever returns to user mode after this, its register state 93 * is unspecified, but should be something harmless like an %ENOSYS error 94 * return. It should preserve enough information so that syscall_rollback() 95 * can work (see asm-generic/syscall.h). 96 * 97 * Called without locks, just after entering kernel mode. 98 */ 99 static inline __must_check int tracehook_report_syscall_entry( 100 struct pt_regs *regs) 101 { 102 return ptrace_report_syscall(regs); 103 } 104 105 /** 106 * tracehook_report_syscall_exit - task has just finished a system call 107 * @regs: user register state of current task 108 * @step: nonzero if simulating single-step or block-step 109 * 110 * This will be called if %TIF_SYSCALL_TRACE has been set, when the 111 * current task has just finished an attempted system call. Full 112 * user register state is available here. It is safe to block here, 113 * preventing signals from being processed. 114 * 115 * If @step is nonzero, this report is also in lieu of the normal 116 * trap that would follow the system call instruction because 117 * user_enable_block_step() or user_enable_single_step() was used. 118 * In this case, %TIF_SYSCALL_TRACE might not be set. 119 * 120 * Called without locks, just before checking for pending signals. 121 */ 122 static inline void tracehook_report_syscall_exit(struct pt_regs *regs, int step) 123 { ... 130 131 ptrace_report_syscall(regs); 132 }
And ptrace_report_syscall
generates SIGTRAP for debugger or strace via ptrace_notify
/ptrace_do_notify
:
55 /* 56 * ptrace report for syscall entry and exit looks identical. 57 */ 58 static inline int ptrace_report_syscall(struct pt_regs *regs) 59 { 60 int ptrace = current->ptrace; 61 62 if (!(ptrace & PT_PTRACED)) 63 return 0; 64 65 ptrace_notify(SIGTRAP | ((ptrace & PT_TRACESYSGOOD) ? 0x80 : 0)); 66 67 /* 68 * this isn't the same as continuing with a signal, but it will do 69 * for normal use. strace only continues with a signal if the 70 * stopping signal is not SIGTRAP. -brl 71 */ 72 if (current->exit_code) { 73 send_sig(current->exit_code, current, 1); 74 current->exit_code = 0; 75 } 76 77 return fatal_signal_pending(current); 78 }
ptrace_notify
is implemented in kernel/signal.c
, it stops the child and pass sig_info to ptracer:
1961 static void ptrace_do_notify(int signr, int exit_code, int why) 1962 { 1963 siginfo_t info; 1964 1965 memset(&info, 0, sizeof info); 1966 info.si_signo = signr; 1967 info.si_code = exit_code; 1968 info.si_pid = task_pid_vnr(current); 1969 info.si_uid = from_kuid_munged(current_user_ns(), current_uid()); 1970 1971 /* Let the debugger run. */ 1972 ptrace_stop(exit_code, why, 1, &info); 1973 } 1974 1975 void ptrace_notify(int exit_code) 1976 { 1977 BUG_ON((exit_code & (0x7f | ~0xffff)) != SIGTRAP); 1978 if (unlikely(current->task_works)) 1979 task_work_run(); 1980 1981 spin_lock_irq(¤t->sighand->siglock); 1982 ptrace_do_notify(SIGTRAP, exit_code, CLD_TRAPPED); 1983 spin_unlock_irq(¤t->sighand->siglock); 1984 }
ptrace_stop
is in the same signal.c
file, line 1839 for 3.13.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With