Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Weird SIGSEGV segmentation fault in std::string::assign() method from libstdc++.so.6

My program recently encountered a weird segfault when running. I want to know if somebody had met this error before and how it could be fixed. Here is more info:

Basic info:

  • CentOS 5.2, kernal version is 2.6.18
  • g++ (GCC) 4.1.2 20080704 (Red Hat 4.1.2-50)
  • CPU: Intel x86 family
  • libstdc++.so.6.0.8
  • My program will start multiple threads to process data. The segfault occurred in one of the threads.
  • Though it's a multi-thread program, the segfault seemed to occur on a local std::string object. I'll show this in the code snippet later.
  • The program is compiled with -g, -Wall and -fPIC, and without -O2 or other optimization options.

The core dump info:

Core was generated by `./myprog'.
Program terminated with signal 11, Segmentation fault.
#0  0x06f6d919 in __gnu_cxx::__exchange_and_add(int volatile*, int) () from /usr/lib/libstdc++.so.6
(gdb) bt
#0  0x06f6d919 in __gnu_cxx::__exchange_and_add(int volatile*, int) () from /usr/lib/libstdc++.so.6
#1  0x06f507c3 in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::assign(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /usr/lib/libstdc++.so.6
#2  0x06f50834 in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::operator=(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /usr/lib/libstdc++.so.6
#3  0x081402fc in Q_gdw::ProcessData (this=0xb2f79f60) at ../../../myprog/src/Q_gdw/Q_gdw.cpp:798
#4  0x08117d3a in DataParser::Parse (this=0x8222720) at ../../../myprog/src/DataParser.cpp:367
#5  0x08119160 in DataParser::run (this=0x8222720) at ../../../myprog/src/DataParser.cpp:338
#6  0x080852ed in Utility::__dispatch (arg=0x8222720) at ../../../common/thread/Thread.cpp:603
#7  0x0052c832 in start_thread () from /lib/libpthread.so.0
#8  0x00ca845e in clone () from /lib/libc.so.6

Please note that the segfault begins within the basic_string::operator=().

The related code: (I've shown more code than that might be needed, and please ignore the coding style things for now.)

int Q_gdw::ProcessData()
{
    char tmpTime[10+1] = {0};
    char A01Time[12+1] = {0};
    std::string tmpTimeStamp;

    // Get the timestamp from TP
    if((m_BackFrameBuff[11] & 0x80) >> 7)
    {
        for (i = 0; i < 12; i++)
        {
            A01Time[i] = (char)A15Result[i];
        }
        tmpTimeStamp = FormatTimeStamp(A01Time, 12);  // Segfault occurs on this line

And here is the prototype of this FormatTimeStamp method:

std::string FormatTimeStamp(const char *time, int len)

I think such string assignment operations should be a kind of commonly used one, but I just don't understand why a segfault could occurr here.

What I have investigated:

I've searched on the web for answers. I looked at here. The reply says try to recompile the program with _GLIBCXX_FULLY_DYNAMIC_STRING macro defined. I tried but the crash still happens.

I also looked at here. It also says to recompile the program with _GLIBCXX_FULLY_DYNAMIC_STRING, but the author seems to be dealing with a different problem with mine, thus I don't think his solution works for me.


Updated on 08/15/2011

Here is the original code of this FormatTimeStamp. I understand the coding doesn't look very nice(too many magic numbers, for instance..), but let's focus on the crash issue first.

string Q_gdw::FormatTimeStamp(const char *time, int len)
{
    string timeStamp;
    string tmpstring;

    if (time)  // It is guaranteed that "time" is correctly zero-terminated, so don't worry about any overflow here.
        tmpstring = time;

    // Get the current time point.
    int year, month, day, hour, minute, second;
#ifndef _WIN32
    struct timeval timeVal;
    struct tm *p;
    gettimeofday(&timeVal, NULL);
    p = localtime(&(timeVal.tv_sec));
    year = p->tm_year + 1900;
    month = p->tm_mon + 1;
    day = p->tm_mday;
    hour = p->tm_hour;
    minute = p->tm_min;
    second = p->tm_sec;
#else
    SYSTEMTIME sys;
    GetLocalTime(&sys);
    year = sys.wYear;
    month = sys.wMonth;
    day = sys.wDay;
    hour = sys.wHour;
    minute = sys.wMinute;
    second = sys.wSecond;
#endif

    if (0 == len)
    {
        // The "time" doesn't specify any time so we just use the current time
        char tmpTime[30];
        memset(tmpTime, 0, 30);
        sprintf(tmpTime, "%d-%d-%d %d:%d:%d.000", year, month, day, hour, minute, second);
        timeStamp = tmpTime;
    }
    else if (6 == len)
    {
        // The "time" specifies "day-month-year" with each being 2-digit.
        // For example: "150811" means "August 15th, 2011".
        timeStamp = "20";
        timeStamp = timeStamp + tmpstring.substr(4, 2) + "-" + tmpstring.substr(2, 2) + "-" +
                tmpstring.substr(0, 2);
    }
    else if (8 == len)
    {
        // The "time" specifies "minute-hour-day-month" with each being 2-digit.
        // For example: "51151508" means "August 15th, 15:51".
        // As the year is not specified, the current year will be used.
        string strYear;
        stringstream sstream;
        sstream << year;
        sstream >> strYear;
        sstream.clear();

        timeStamp = strYear + "-" + tmpstring.substr(6, 2) + "-" + tmpstring.substr(4, 2) + " " +
                tmpstring.substr(2, 2) + ":" + tmpstring.substr(0, 2) + ":00.000";
    }
    else if (10 == len)
    {
        // The "time" specifies "minute-hour-day-month-year" with each being 2-digit.
        // For example: "5115150811" means "August 15th, 2011, 15:51".
        timeStamp = "20";
        timeStamp = timeStamp + tmpstring.substr(8, 2) + "-" + tmpstring.substr(6, 2) + "-" + tmpstring.substr(4, 2) + " " +
                tmpstring.substr(2, 2) + ":" + tmpstring.substr(0, 2) + ":00.000";
    }
    else if (12 == len)
    {
        // The "time" specifies "second-minute-hour-day-month-year" with each being 2-digit.
        // For example: "305115150811" means "August 15th, 2011, 15:51:30".
        timeStamp = "20";
        timeStamp = timeStamp + tmpstring.substr(10, 2) + "-" + tmpstring.substr(8, 2) + "-" + tmpstring.substr(6, 2) + " " +
                tmpstring.substr(4, 2) + ":" + tmpstring.substr(2, 2) + ":" + tmpstring.substr(0, 2) + ".000";
    }

    return timeStamp;
}

Updated on 08/19/2011

This problem has finally been addressed and fixed. The FormatTimeStamp() function has nothing to do with the root cause, in fact. The segfault is caused by a writing overflow of a local char buffer.

This problem can be reproduced with the following simpler program(please ignore the bad namings of some variables for now):

(Compiled with "g++ -Wall -g main.cpp")

#include <string>
#include <iostream>

void overflow_it(char * A15, char * A15Result)
{
    int m;
    int t = 0,i = 0;
    char temp[3];

    for (m = 0; m < 6; m++)
    {
        t = ((*A15 & 0xf0) >> 4) *10 ;
        t += *A15 & 0x0f;
        A15 ++;
        
        std::cout << "m = " << m << "; t = " << t << "; i = " << i << std::endl;
        
        memset(temp, 0, sizeof(temp));
        sprintf((char *)temp, "%02d", t);   // The buggy code: temp is not big enough when t is a 3-digit integer.
        A15Result[i++] = temp[0];
        A15Result[i++] = temp[1];
    }
}

int main(int argc, char * argv[])
{
    std::string str;

    {
        char tpTime[6] = {0};
        char A15Result[12] = {0};
        
        // Initialize tpTime
        for(int i = 0; i < 6; i++)
            tpTime[i] = char(154);  // 154 would result in a 3-digit t in overflow_it().
        
        overflow_it(tpTime, A15Result);
        
        str.assign(A15Result);
    }

    std::cout << "str says: " << str << std::endl;

    return 0;
}

Here are two facts we should remember before going on: 1). My machine is an Intel x86 machine so it's using the Little Endian rule. Therefore for a variable "m" of int type, whose value is, say, 10, it's memory layout might be like this:

Starting addr:0xbf89bebc: m(byte#1): 10
               0xbf89bebd: m(byte#2): 0
               0xbf89bebe: m(byte#3): 0
               0xbf89bebf: m(byte#4): 0

2). The program above runs within the main thread. When it comes to the overflow_it() function, the variables layout in the thread stack looks like this(which only shows the important variables):

0xbfc609e9 : temp[0]
0xbfc609ea : temp[1]
0xbfc609eb : temp[2]
0xbfc609ec : m(byte#1) <-- Note that m follows temp immediately.  m(byte#1) happens to be the byte temp[3].
0xbfc609ed : m(byte#2)
0xbfc609ee : m(byte#3)
0xbfc609ef : m(byte#4)
0xbfc609f0 : t
...(3 bytes)
0xbfc609f4 : i
...(3 bytes)
...(etc. etc. etc...)
0xbfc60a26 : A15Result  <-- Data would be written to this buffer in overflow_it()
...(11 bytes)
0xbfc60a32 : tpTime
...(5 bytes)
0xbfc60a38 : str    <-- Note the str takes up 4 bytes.  Its starting address is **16 bytes** behind A15Result.

My analysis:

1). m is a counter in overflow_it() whose value is incremented by 1 at each for loop and whose max value is supposed not greater than 6. Thus it's value could be stored completely in m(byte#1)(remember it's Little Endian) which happens to be temp3.

2). In the buggy line: When t is a 3-digit integer, such as 109, then the sprintf() call would result in a buffer overflow, because serializing the number 109 to the string "109" actually requires 4 bytes: '1', '0', '9' and a terminating '\0'. Because temp[] is allocated with 3 bytes only, the final '\0' would definitely be written to temp3, which is just the m(byte#1), which unfortunately stores m's value. As a result, m's value is reset to 0 every time.

3). The programmer's expectation, however, is that the for loop in the overflow_it() would execute 6 times only, with each time m being incremented by 1. Because m is always reset to 0, the actual loop time is far more than 6 times.

4). Let's look at the variable i in overflow_it(): Every time the for loop is executed, i's value is incremented by 2, and A15Result[i] will be accessed. However, if you compile and run this program, you'll see the i value finally adds up to 24, which means the overflow_it() writes data to the bytes ranging from A15Result[0] to A15Result[23]. Note that the object str is only 16 bytes behind A15Result[0], thus the overflow_it() has "sweeped through" str and destroy it's correct memory layout.

5). I think the correct use of std::string, as it is a non-POD data structure, depends on that that instantiated std::string object must have a correct internal state. But in this program, str's internal layout has been changed by force externally. This should be why the assign() method call would finally cause a segfault.


Update on 08/26/2011

In my previous update on 08/19/2011, I said that the segfault was caused by a method call on a local std::string object whose memory layout had been broken and thus became a "destroyed" object. This is not an "always" true story. Consider the C++ program below:

//C++
class A {
    public:
        void Hello(const std::string& name) {
           std::cout << "hello " << name;
         }
};
int main(int argc, char** argv)
{
    A* pa = NULL; //!!
    pa->Hello("world");
    return 0;
}

The Hello() call would succeed. It would succeed even if you assign an obviously bad pointer to pa. The reason is: the non-virtual methods of a class don't reside within the memory layout of the object, according to the C++ object model. The C++ compiler turns the A::Hello() method to something like, say, A_Hello_xxx(A * const this, ...) which could be a global function. Thus, as long as you don't operate on the "this" pointer, things could go pretty well.

This fact shows that a "bad" object is NOT the root cause that results in the SIGSEGV segfault. The assign() method is not virtual in std::string, thus the "bad" std::string object wouldn't cause the segfault. There must be some other reason that finally caused the segfault.

I noticed that the segfault comes from the __gnu_cxx::__exchange_and_add() function, so I then looked into its source code in this web page:

00046   static inline _Atomic_word 
00047   __exchange_and_add(volatile _Atomic_word* __mem, int __val)
00048   { return __sync_fetch_and_add(__mem, __val); }

The __exchange_and_add() finally calls the __sync_fetch_and_add(). According to this web page, the __sync_fetch_and_add() is a GCC builtin function whose behavior is like this:

type __sync_fetch_and_add (type *ptr, type value, ...)
{
    tmp = *ptr; 
    *ptr op= value; // Here the "op=" means "+=" as this function is "_and_add".
    return tmp;
}

There it is! The passed-in ptr pointer is dereferenced here. In the 08/19/2011 program, the ptr is actually the "this" pointer of the "bad" std::string object within the assign() method. It is the derefenence at this point that actually caused the SIGSEGV segmentation fault.

We could test this with the following program:

#include <bits/atomicity.h>

int main(int argc, char * argv[])
{
    __sync_fetch_and_add((_Atomic_word *)0, 10);    // Would result in a segfault.

    return 0;
}
like image 820
yaobin Avatar asked Aug 12 '11 09:08

yaobin


People also ask

How do you avoid SIGSEGV errors?

Avoid naked pointers (prefer smart pointers, such as std::unique_ptr or std::shared_ptr for pointers that own data, and use iterators into standard containers if you want to merely point at stuff) Use standard containers (e.g. std::vector ) instead of arrays and pointer arithmetics.

What causes SIGSEGV error?

A SIGSEGV signal or segmentation error occurs when a process attempts to use a memory address that was not assigned to it by the MMU.

What happens during a Segfault?

A segmentation fault occurs when a program attempts to access a memory location that it is not allowed to access, or attempts to access a memory location in a way that is not allowed (for example, attempting to write to a read-only location, or to overwrite part of the operating system).

Can you catch Segfault?

You can't catch segfaults. Segfaults lead to undefined behavior - period (err, actually segfaults are the result of operations also leading to undefined behavior.


1 Answers

There are two likely possibilities:

  • some code before line 798 has corrupted the local tmpTimeStamp object
  • the return value from FormatTimeStamp() was somehow bad.

The _GLIBCXX_FULLY_DYNAMIC_STRING is most likely a red herring and has nothing to do with the problem.

If you install debuginfo package for libstdc++ (I don't know what it's called on CentOS), you'll be able to "see into" that code, and might be able to tell whether the left-hand-side (LHS) or the RHS of the assignment operator caused the problem.

If that's not possible, you'll have to debug this at the assembly level. Going into frame #2 and doing x/4x $ebp should give you previous ebp, caller address (0x081402fc), LHS (should match &tmpTimeStamp in frame #3), and RHS. Go from there, and good luck!

like image 118
Employed Russian Avatar answered Oct 05 '22 23:10

Employed Russian