Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

x64 memset core, is passed buffer address truncated?

1. Problem Background

Recently a core dump occurred on one of our on-line search server. The core happens in memset() due to the attempt to write to an invalid address, and hence received the SIGSEGV signal. The following information is from dmsg:

is_searcher_ser[17405]: segfault at 000000002c32a668 rip 0000003da0a7b006 rsp 0000000053abc790 error 6

The environment of our on-line servers goes as follows:

  • OS: RHEL 5.3
  • Kernel: 2.6.18-131.el5.custom, x86_64 (64-bit)
  • GCC: 4.1.2 20080704 (Red Hat 4.1.2-44)
  • Glibc: glibc-2.5-49.6

The following is the relevant code snippet:

CHashMap<…>::CHashMap(…)
{
     …
     typedef HashEntry *HashEntryPtr;              
     m_ppEntry = new HashEntryPtr[m_nHashSize];   // m_nHashSize is 389 when core
     assert(m_ppEntry != NULL);
     memset(m_ppEntry, 0x0, m_nHashSize*sizeof(HashEntryPtr)); // Core in this memset() invocation 
     …
}

The assembly code of the above code is:

…
0x000000000091fe9e <+110>:   callq  0x502638 <_Znam@plt>  // new HashEntryPtr[m_nHashSize]
0x000000000091fea3 <+115>:   mov    0xc(%rbx),%edx         // Get the value of m_nHashSize
0x000000000091fea6 <+118>:   mov    %rax,%rdi               // Put m_ppEntry pointer to %rdi for later memset invocation
0x000000000091fea9 <+121>:   mov    %rax,0x20(%rbx)        // Store the pointer to m_ppEntry member variable(%rbx holds the this pointer)
0x000000000091fead <+125>:   xor    %esi,%esi               // Generate 0
0x000000000091feaf <+127>:   shl    $0x3,%rdx               // m_nHashSize*sizeof(HashEntryPtr)
0x000000000091feb3 <+131>:   callq  0x502b38 <memset@plt> // Call the memset() function
…

In the core dump, the assembly of memset@plt is:

(gdb) disassemble 0x502b38
Dump of assembler code for function memset@plt:
    0x0000000000502b38 <+0>:     jmpq   *0x771b92(%rip)        # 0xc746d0 <[email protected]>
    0x0000000000502b3e <+6>:     pushq  $0x53
    0x0000000000502b43 <+11>:    jmpq   0x5025f8
End of assembler dump.
 (gdb) x/ag 0x0000000000502b3e+0x771b92
    0xc746d0 <[email protected]>:      0x3da0a7acb0 <memset>
 (gdb) disassemble 0x3da0a7acb0
 Dump of assembler code for function memset:
    0x0000003da0a7acb0 <+0>:     cmp    $0x1,%rdx
    0x0000003da0a7acb4 <+4>:     mov    %rdi,%rax
    …

For the above GDB analysis, we know that the address of memset() has been resolved in the relocation PLT table. That is to say, the first jmpq *0x771b92(%rip) will directly jump to the first instruction of function memset(). Besides, the program had run nearly one day on-line, the relocation address of memset() should have been already resolved earlier.

2. Weird phenomenon

This core fired at the instruction => 0x0000003da0a7b006 <+854>: mov %rdx,-0x8(%rdi) in the memset(). Actually this is the instruction in the memset() to set the 0 at the right begin position of the buffer which is the first parameter of memset().

When cored , in frame 0, the value of $rdi is 0x2c32a670 ,and $rax is 0x2c32a668. From the assembly analysis and off-line test, $rax should hold the source buffer of the memset, i.e., the first parameter of memset().

So, in our example, $rax should be same as the address of m_ppEntry, the value of which is stored in the this object (this pointer is stored in %rbx) first before it is zeroed by memset later. However, the value of m_ppEntry is 0x2ab02c32a668.

Then use info files GDB command to check, the address 0x2c32a668 is indeed invalid (not mapped), and address 0x2ab02c32a668 is a valid address.

3. Why it is weird?

The weird place of this core is that: If the real address of memset has been resolved already(very very probably), then there are only very few instructions between the operation to put the pointer value into m_ppEntry and the attempt to memset it. And actually the value of register $rax (holding the passed buffer address) are not changed at all during these instructions. So, how can m_ppEntry isn’t equal to $rax?

What is weird More is that: when core, the value of $rax (0x2c32a668) is actually the value of lower 4 bytes of m_ppEntry (0x2ab02c32a668). If there is indeed some relationship between the two values, is the m_ppEntry parameter passed to memset being truncated? However, the involved several instructions all use %rax, rather than %eax. By the way, I cannot reproduce this issue offline.

So,

1) Which address is valid? If 0x2c32a668 is valid? Is the heap corrupted just between the several instructions? And how to paraphrase that the value of m_ppEntry is 0x2ab02c32a668, and why the low 4 bytes of this two value is the same?

2) If 0x2ab02c32a668 is valid, why the address is truncated when passed into the 64-bit memset()? Under which condition this error will occur? I cannot reproduce this offline. Is this issue an known bug? I didn't find it through Google.

3) Or, is it due to some hardware or power issue to make the 4 higher bytes of %rdi passed to memset zeroed? (I’m very very reluctant to believe this).

At last, any comment on this core is appreciated.

Thanks,

Gary Hu

like image 744
user1878089 Avatar asked Dec 05 '12 13:12

user1878089


1 Answers

I'm assuming most of the time this code works fine, given your mention of one day's running. I agree signals are worth inspecting, it does look suspiciously like pointer truncation is happening somewhere else.

Only other thing I'm thinking it could be an issue with the new. Is there any possibly that on occasion you could end up calling an overloaded new operator? Also for completeness what is the declaration of m_ppEntry ? I'm assuming you're using a no throw new otherwise the assert(m_ppEntry != NULL); would be meaningless.

like image 101
AlgebraWinter Avatar answered Nov 10 '22 23:11

AlgebraWinter