Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to find the location where a perl script (linux process) is hung

Tags:

process

perl

I have a very hard to reproduce condition where a perl process gets hung. I am not sure where it is hung. ps ax | grep <process name> shows stat column as SN which I understand means it is sleeping and is running at nice priority.

I looked over the script (there is a ton of code in there) but cannot see any particular sleep that lasts over more than a few seconds (this process has been sleeping for more than a day).

I can't restart and add log to the Perl scripts because the condition may not be reproduced. I can try strace but wondering if there is a better mechanism

like image 291
doon Avatar asked Feb 21 '23 16:02

doon


1 Answers

One possible way is to use gdb.

First of all, you need debugging symbols for your perl interpreter. For example, on my Debian system I had to install the perl-debug package for this. After the installation we have /usr/lib/debug/usr/bin/perl, we will pass this to gdb later. Notice that the original, stuck Perl script was started using /usr/bin/perl, not the newly installed debugging version.

For the sake of this example, let's run this Perl script:

$ cat test.pl 
#! /usr/bin/perl

use strict;
use warnings;

print "pid: ", $$, "\n";

while (1) {
  print "line ", __LINE__, "\n"; sleep 1;
  print "line ", __LINE__, "\n"; sleep 1;
}

When we run it, we get an output like:

$ ./test.pl 
pid: 15764
line 9
line 10
line 9
line 10
^C

Now let's start up gdb. Use the pid printed by test.pl running right now. We get a prompt after some initial info ("Reading symbols from ..."):

$ gdb /usr/lib/debug/usr/bin/perl 15809
[snip]
(gdb) 

In the meanwhile, due to attaching gdb to the perl interpreter, perl gets stopped:

$ ./test.pl
pid: 15809
line 9
line 10
[snip]
line 9
line 10
line 9
[no further output]

Now, let's get back to gdb for a backtrace:

(gdb) backtrace
#0  0x00007fd5b4479830 in __nanosleep_nocancel () at ../sysdeps/unix/syscall-template.S:82
#1  0x00007fd5b44796c0 in __sleep (seconds=<optimized out>) at ../sysdeps/unix/sysv/linux/sleep.c:138
#2  0x00007fd5b4efc1e2 in Perl_pp_sleep (my_perl=0x1a91010) at pp_sys.c:4586
#3  0x00007fd5b4ea89b6 in Perl_runops_standard (my_perl=0x1a91010) at run.c:41
#4  0x00007fd5b4e4a585 in S_run_body (oldscope=1, my_perl=0x1a91010) at perl.c:2350
#5  perl_run (my_perl=0x1a91010) at perl.c:2268
#6  0x0000000000400f89 in main (argc=2, argv=0x7fff4de87628, env=0x7fff4de87640) at perlmain.c:120

As it was likely, perl happened to be stopped in the middle of a sleep(). But which one?

Now we need to figure where to look for perl's internal info on the currently executing (Perl) source file and line. Originally I found some relevant info in the doumentation of mod_perl. Look for the curinfo macro in there.

(gdb) p my_perl->Icurcop->cop_file 
$1 = 0x1abd810 "./test.pl"
(gdb) p my_perl->Icurcop->cop_line 
$2 = 9

As we can see we are on line 9 in test.pl - as expected based on the script's output.

The linked documentation mentions a few differences about threaded/non-threaded perl binaries (the example above is for threaded perl, v5.14.2). It also looks a bit outdated, because it talks about my_perl->Tcurcop, while I found what I was looking for under the name of my_perl->Icurcop. At the moment, I'm not familiar enough with the internals of perl to tell why this got renamed.

I hope this helps.

like image 188
rubasov Avatar answered Apr 29 '23 20:04

rubasov