Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

perlbench results in segfault outside the SPEC 2006 harness

This might be overly specific, but posting here as it might help someone else who's trying to compile/run the SPEC 2006 benchmarks outside the default SPEC benchmark harness. (Our reason of doing this is comparing compiling strategies and code coverage, while the SPEC harness is focused on performance of the resulting code only).

When performing a ref run of perlbench the benchmark crashes with a segmentation fault:

    Program received signal SIGSEGV, Segmentation fault.
0x00000000004f6868 in S_regmatch (prog=0x832144)

    at <path-to-spec>/CPU2006/400.perlbench/src/regexec.c:3024
3024            PL_reg_start_tmp[n] = locinput;
(gdb) bt
#0  0x00000000004f6868 in S_regmatch (prog=0x832144)
    at <path-to-spec>/CPU2006/400.perlbench/src/regexec.c:3024
#1  0x00000000004f22cf in S_regtry (prog=0x8320c0, startpos=0x831e70 "o")
    at <path-to-spec>/CPU2006/400.perlbench/src/regexec.c:2196
#2  0x00000000004eba71 in Perl_regexec_flags (prog=0x8320c0, stringarg=0x831e70 "o", strend=0x831e71 "", 
    strbeg=0x831e70 "o", minend=0, sv=0x7e2528, data=0x0, flags=3)
    at <path-to-spec>/CPU2006/400.perlbench/src/regexec.c:1910
#3  0x00000000004b33bb in Perl_pp_match ()
    at <path-to-spec>/CPU2006/400.perlbench/src/pp_hot.c:1340
#4  0x00000000004fcde4 in Perl_runops_standard ()
    at <path-to-spec>/CPU2006/400.perlbench/src/run.c:37
#5  0x000000000046bf57 in S_run_body (oldscope=1)
    at <path-to-spec>/CPU2006/400.perlbench/src/perl.c:2017
#6  0x000000000046b9f6 in perl_run (my_perl=0x7bf010)
    at <path-to-spec>/CPU2006/400.perlbench/src/perl.c:1934
#7  0x000000000047add2 in main (argc=4, argv=0x7fffffffe178, env=0x7fffffffe1a0)
    at <path-to-spec>/CPU2006/400.perlbench/src/perlmain.c:98

The execution environment is 64-bit Linux and the behaviour is observed with both the latest gcc and clang.

What causes this crash?

like image 328
stanm Avatar asked Feb 06 '23 20:02

stanm


2 Answers

The segfault is caused by a garbage value of the variable n on the pointed out line. Inspecting the code shows that the value comes from the field arg1 of an object of type:

struct regnode_1 {
    U8  flags;
    U8  type;
    U16 next_off;
    U32 arg1;
};

Inspecting the memory location of the object shows that it is not packed, i.e. there is 32bit padding between next_off and arg1:

(gdb) x/16xb scan
0x7f4978:       0xde    0x2d    0x02    0x00    0x00    0x00    0x00    0x00
0x7f4980:       0x00    0x11    0x0d    0x00    0x00    0x00    0x00    0x00
(gdb) print/x n
$1 = 0xd1100

This is suspicious. There's pointer and type conversion going on in perlbench, so perhaps type size assumptions fail somewhere. Compiling with multilib yields a working benchmark and examining the memory verifies that there is no padding.

Forcing the structure into a bitfield fixes the crash when performing a 64-bit compile:

struct regnode_1 {
    U8  flags : 8;
    U8  type : 8;
    U16 next_off : 16;
    U32 arg1 : 32;
};
like image 166
stanm Avatar answered Mar 05 '23 07:03

stanm


This is how our little investigation progressed:

At first we thought it was some padding issue, but as Peter pointed out on Godbolt, no such thing occurs. So, the packing or not of the structure did not change anything.

Then, I got suspicious of the (clearly twisted) way that Perl handles pointers. The majority of the casts are violating strict aliasing as defined by the standard. Since the segmentation fault happened on a pointer cast, namely:

struct regnode {
    U8  flags;
    U8  type;
    U16 next_off;
};

to

struct regnode_1 {
    U8  flags;
    U8  type;
    U16 next_off;
    U32 arg1;
};

However, enabling it with the -fstrict-aliasing flags didn't change anything. Although it qualifies as undefined behaviour, there is no overlap in memory, since the elements/nodes of the regular expression that is being currently parsed are laid out separately in memory.

Going deeper and checking the LLVM IR for the switch block in question, I got this in regexec.ll

; truncated
%876 = load %struct.regnode*, %struct.regnode** %scan, align 8, !dbg !8005
%877 = bitcast %struct.regnode* %876 to %struct.regnode_1*, !dbg !8005
%arg11715 = getelementptr inbounds %struct.regnode_1, %struct.regnode_1* %877, i32 0, i32 3, !dbg !8005
%878 = load i64, i64* %arg11715, align 8, !dbg !8005
store i64 %878, i64* %n, align 8, !dbg !8006
; truncated

The load/store instructions are using a 64-bit integer, which means that the pointer in C is interpreted as pointing to an 8 bytes integer (instead of 4). Thus, gathering 2 bytes outside the current regex node struct bounds for calculating the value of arg1. This value is in turn used as an array index which ultimately causes a segfault crash when it is out of array bounds.

Back to tracing where U32 is interpreted as a 64-bit unsigned integer. Looking into file spec_config.h, the conditional compilation leads (at least in my machine) to a preprocessor block that starts with

#elif !defined(SPEC_CPU_GOOFY_DATAMODEL)

which, according to a code comment in the surrounding area, is supposed to correspond to a ILP32 data model (see also this). However, U32TYPE is defined as an unsigned long, which on my machine is 64 bits.

So, the fix is to change the definition to

#define U32TYPE uint32_t

which, as stated in this, is guaranteed to be exactly 32 bits (if supported).

like image 27
compor Avatar answered Mar 05 '23 05:03

compor