Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Interpreting segfault messages

What is the correct interpretation of the following segfault messages?

segfault at 10 ip 00007f9bebcca90d sp 00007fffb62705f0 error 4 in libQtWebKit.so.4.5.2[7f9beb83a000+f6f000] segfault at 10 ip 00007fa44d78890d sp 00007fff43f6b720 error 4 in libQtWebKit.so.4.5.2[7fa44d2f8000+f6f000] segfault at 11 ip 00007f2b0022acee sp 00007fff368ea610 error 4 in libQtWebKit.so.4.5.2[7f2aff9f7000+f6f000] segfault at 11 ip 00007f24b21adcee sp 00007fff7379ded0 error 4 in libQtWebKit.so.4.5.2[7f24b197a000+f6f000] 
like image 364
knorv Avatar asked Mar 30 '10 22:03

knorv


People also ask

How does a segfault work?

A segmentation fault occurs when a program attempts to access a memory location that it is not allowed to access, or attempts to access a memory location in a way that is not allowed (for example, attempting to write to a read-only location, or to overwrite part of the operating system).

What causes a segfault?

Overview. A segmentation fault (aka segfault) is a common condition that causes programs to crash; they are often associated with a file named core . Segfaults are caused by a program trying to read or write an illegal memory location.

What is segfault in Linux?

On a Unix operating system such as Linux, a "segmentation violation" (also known as "signal 11", "SIGSEGV", "segmentation fault" or, abbreviated, "sig11" or "segfault") is a signal sent by the kernel to a process when the system has detected that the process was attempting to access a memory address that does not ...

Can the kernel segfault?

(Technically, the behavior is undefined when you try to write to memory thats not yours but one of the ways a OS can handle such a situation is by throwing a segfault). For user space code that attempts an illegal memory access, the kernel is the one that detects the illegal memory access and throws the segfault.


2 Answers

This is a segfault due to following a null pointer trying to find code to run (that is, during an instruction fetch).

If this were a program, not a shared library

Run addr2line -e yourSegfaultingProgram 00007f9bebcca90d (and repeat for the other instruction pointer values given) to see where the error is happening. Better, get a debug-instrumented build, and reproduce the problem under a debugger such as gdb.

Since it's a shared library

You're hosed, unfortunately; it's not possible to know where the libraries were placed in memory by the dynamic linker after-the-fact. Reproduce the problem under gdb.

What the error means

Here's the breakdown of the fields:

  • address (after the at) - the location in memory the code is trying to access (it's likely that 10 and 11 are offsets from a pointer we expect to be set to a valid value but which is instead pointing to 0)

  • ip - instruction pointer, ie. where the code which is trying to do this lives

  • sp - stack pointer

  • error - An error code for page faults; see below for what this means on x86 (link).

    /*  * Page fault error code bits:  *  *   bit 0 ==    0: no page found       1: protection fault  *   bit 1 ==    0: read access         1: write access  *   bit 2 ==    0: kernel-mode access  1: user-mode access  *   bit 3 ==                           1: use of reserved bit detected  *   bit 4 ==                           1: fault was an instruction fetch  *   bit 5 ==                           1: protection keys block access  *   bit 15 ==                          1: SGX MMU page-fault  */ 
like image 133
Charles Duffy Avatar answered Oct 22 '22 09:10

Charles Duffy


Error 4 means "The cause was a user-mode read resulting in no page being found.". There's a tool that decodes it here.

Here's the definition from the kernel. Keep in mind that 4 means that bit 2 is set and no other bits are set. If you convert it to binary that becomes clear.

/*  * Page fault error code bits  *      bit 0 == 0 means no page found, 1 means protection fault  *      bit 1 == 0 means read, 1 means write  *      bit 2 == 0 means kernel, 1 means user-mode  *      bit 3 == 1 means use of reserved bit detected  *      bit 4 == 1 means fault was an instruction fetch  */ #define PF_PROT         (1<<0) #define PF_WRITE        (1<<1) #define PF_USER         (1<<2) #define PF_RSVD         (1<<3) #define PF_INSTR        (1<<4) 

Now then, "ip 00007f9bebcca90d" means the instruction pointer was at 0x00007f9bebcca90d when the segfault happened.

"libQtWebKit.so.4.5.2[7f9beb83a000+f6f000]" tells you:

  • The object the crash was in: "libQtWebKit.so.4.5.2"
  • The base address of that object "7f9beb83a000"
  • How big that object is: "f6f000"

If you take the base address and subtract it from the ip, you get the offset into that object:

0x00007f9bebcca90d - 0x7f9beb83a000 = 0x49090D 

Then you can run addr2line on it:

addr2line -e /usr/lib64/qt45/lib/libQtWebKit.so.4.5.2 -fCi 0x49090D ?? ??:0 

In my case it wasn't successful, either the copy I installed isn't identical to yours, or it's stripped.

like image 36
Tim Avatar answered Oct 22 '22 11:10

Tim