Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I trace a system call in Linux?

How would I follow a system call from a trap to the kernel, to how arguments are passed, to how the system call in located in the kernel, to the actual processing of the system call in the kernel, to the return back to the user and how state is restored?

like image 393
luminous12 Avatar asked Apr 24 '15 06:04

luminous12


People also ask

Which tool is useful for tracing the system calls in Linux?

The tracing tools on Linux are strace and ltrace. The command man strace displays a full set of available options. The strace tool traces system calls. You can either use it on a process that is already available, or start it with a new process.

What utility is used to trace system calls?

strace is a Linux utility that lets you trace the system calls that a given application makes.

Where are system calls stored in Linux?

The specific system call being invoked is stored in the EAX register, abd its arguments are held in the other processor registers.


3 Answers

SystemTap

This is the most powerful method I've found so far. It can even show the call arguments: Does ftrace allow capture of system call arguments to the Linux kernel, or only function names?

Usage:

sudo apt-get install systemtap
sudo stap -e 'probe syscall.mkdir { printf("%s[%d] -> %s(%s)\n", execname(), pid(), name, argstr) }'

Then on another terminal:

sudo rm -rf /tmp/a /tmp/b
mkdir /tmp/a
mkdir /tmp/b

Sample output:

mkdir[4590] -> mkdir("/tmp/a", 0777)
mkdir[4593] -> mkdir("/tmp/b", 0777)

Documentation: https://sourceware.org/systemtap/documentation.html

Seems to be kprobes based: https://sourceware.org/systemtap/archpaper.pdf

See also: How to trace just system call events with ftrace without showing any other functions in the Linux kernel?

Tested on Ubuntu 18.04, Linux kernel 4.15.

ltrace -S shows both system calls and library calls

This awesome tool therefore gives even further visibility into what executables are doing.

Here for example I used it to analyze what system calls dlopen is making: https://unix.stackexchange.com/questions/226524/what-system-call-is-used-to-load-libraries-in-linux/462710#462710

ftrace minimal runnable example

Mentioned at https://stackoverflow.com/a/29840482/895245 but here goes a minimal runnable example.

Run with sudo:

#!/bin/sh
set -eux

d=debug/tracing

mkdir -p debug
if ! mountpoint -q debug; then
  mount -t debugfs nodev debug
fi

# Stop tracing.
echo 0 > "${d}/tracing_on"

# Clear previous traces.
echo > "${d}/trace"

# Find the tracer name.
cat "${d}/available_tracers"

# Disable tracing functions, show only system call events.
echo nop > "${d}/current_tracer"

# Find the event name with.
grep mkdir "${d}/available_events"

# Enable tracing mkdir.
# Both statements below seem to do the exact same thing,
# just with different interfaces.
# https://www.kernel.org/doc/html/v4.18/trace/events.html
echo sys_enter_mkdir > "${d}/set_event"
# echo 1 > "${d}/events/syscalls/sys_enter_mkdir/enable"

# Start tracing.
echo 1 > "${d}/tracing_on"

# Generate two mkdir calls by two different processes.
rm -rf /tmp/a /tmp/b
mkdir /tmp/a
mkdir /tmp/b

# View the trace.
cat "${d}/trace"

# Stop tracing.
echo 0 > "${d}/tracing_on"

umount debug

Sample output:

# tracer: nop
#
#                              _-----=> irqs-offhttps://sourceware.org/systemtap/documentation.html
#                             / _----=> need-resched
#                            | / _---=> hardirq/softirq
#                            || / _--=> preempt-depth
#                            ||| /     delay
#           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
#              | |       |   ||||       |         |
           mkdir-5619  [005] .... 10249.262531: sys_mkdir(pathname: 7fff93cbfcb0, mode: 1ff)
           mkdir-5620  [003] .... 10249.264613: sys_mkdir(pathname: 7ffcdc91ecb0, mode: 1ff)

One cool thing about this method is that it shows the function call for all processes on the system at once, although you can also filter PIDs of interest with set_ftrace_pid.

Documentation at: https://www.kernel.org/doc/html/v4.18/trace/index.html

Tested on Ubuntu 18.04, Linux kernel 4.15.

GDB step debug the Linux kernel

Depending on the level of internals detail you need, this is an option: How to debug the Linux kernel with GDB and QEMU?

strace minimal runnable example

Here is a minimal runnable example of strace: How should strace be used? with a freestanding hello world, which makes how everything works perfectly clear.

More info

  • https://en.pingcap.com/blog/how-to-trace-linux-system-calls-in-production-with-minimal-impact-on-performance might be worth a read, it mentions:

    perf top -F 49 -e raw_syscalls:sys_enter --sort comm,dso --show-nr-samples
    

    and the BPF-based traceloop: https://github.com/kinvolk/traceloop which the article claims to be a very fast method:

    sudo -E ./traceloop cgroups --dump-on-exit /sys/fs/cgroup/system.slice/sshd.service
    

It's actually relatively easy to use ftrace. Here's a classic article by Steven, "Mr. ftrace", Rostedt. The second part is here.

There is a free video by Jan-Simon Möller of the Linux Foundation, and many other good introductory articles that you can find using search terms such as "ftrace tutorial" or "ftrace example".

like image 44
Jonathan Ben-Avraham Avatar answered Sep 28 '22 04:09

Jonathan Ben-Avraham


You can use the -f and -ff option. Something like this:

strace -f -e trace=process bash -c 'ls; :'

-f Trace child processes as they are created by currently traced processes as a result of the fork(2) system call.

-ff If the -o filename option is in effect, each processes trace is written to filename.pid where pid is the numeric process id of each process. This is incompatible with -c, since no per-process counts are kept.

like image 26
Rahul Tripathi Avatar answered Sep 28 '22 02:09

Rahul Tripathi