I have a core dump file from a process that has probably a file descriptor leak (it opens files and sockets but apparently sometimes forgets to close some of them). Is there a way to find out which files and sockets the process had opened before crashing? I can't easily reproduce the crash, so analyzing the core file seems to be the only way to get a hint on the bug.
In a terminal, run sleep 30 to start a process sleeping for 30 seconds. While it is running, press Ctrl + \ to force a core dump. You'll now see a core file in the directory you are in.
With a core file, we can use the debugger (GDB) to inspect the state of the process at the moment it was terminated and to identify the line of code that caused the problem. That's a situation where a core dump file could be produced, but it's not by default.
By default, core dumps are sent to systemd-coredump which can be configured in /etc/systemd/coredump. conf . By default, all core dumps are stored in /var/lib/systemd/coredump (due to Storage=external ) and they are compressed with zstd (due to Compress=yes ).
You just need a binary (with debugging symbols included) that is identical to the one that generated the core dump file. Then you can run gdb path/to/the/binary path/to/the/core/dump/file to debug it. When it starts up, you can use bt (for backtrace) to get a stack trace from the time of the crash.
If you have a core file and you have compiled the program with debugging options (-g), you can see where the core was dumped:
$ gcc -g -o something something.c $ ./something Segmentation fault (core dumped) $ gdb something core
You can use this to do some post-morten debugging. A few gdb commands: bt prints the stack, fr jumps to given stack frame (see the output of bt).
Now if you want to see which files are opened at a segmentation fault, just handle the SIGSEGV signal, and in the handler, just dump the contents of the /proc/PID/fd directory (i.e. with system('ls -l /proc/PID/fs') or execv).
With these information at hand you can easily find what caused the crash, which files are opened and if the crash and the file descriptor leak are connected.
Your best bet is to install a signal handler for whatever signal is crashing your program (SIGSEGV, etc.).
Then, in the signal handler, inspect /proc/self/fd, and save the contents to a file. Here is a sample of what you might see:
Anderson cxc # ls -l /proc/8247/fd total 0 lrwx------ 1 root root 64 Sep 12 06:05 0 -> /dev/pts/0 lrwx------ 1 root root 64 Sep 12 06:05 1 -> /dev/pts/0 lrwx------ 1 root root 64 Sep 12 06:05 10 -> anon_inode:[eventpoll] lrwx------ 1 root root 64 Sep 12 06:05 11 -> socket:[124061] lrwx------ 1 root root 64 Sep 12 06:05 12 -> socket:[124063] lrwx------ 1 root root 64 Sep 12 06:05 13 -> socket:[124064] lrwx------ 1 root root 64 Sep 12 06:05 14 -> /dev/driver0 lr-x------ 1 root root 64 Sep 12 06:05 16 -> /temp/app/whatever.tar.gz lr-x------ 1 root root 64 Sep 12 06:05 17 -> /dev/urandom
Then you can return from your signal handler, and you should get a core dump as usual.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With