Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is std::string thead-safe with gcc 4.3?

Tags:

I'm developing a multithreaded program running on Linux (compiled with G++ 4.3) and if you search around for a bit you find a lot of scary stories about std::string not being thread-safe with GCC. This is supposedly due to the fact that internally it uses copy-on-write which wreaks havoc with tools like Helgrind.

I've made a small program that copies one string to another string and if you inspect both strings they both share the same internal _M_p pointer. When one string is modified the pointer changes so the copy-on-write stuff is working fine.

What I'm worried about though is what happens if I share a string between two threads (for instance passing it as an object in a threadsafe dataqueue between two threads). I've already tried compiling with the '-pthread' option but that does not seem to make much difference. So my question:

  • Is there any way to force std::string to be threadsafe? I would not mind if the copy-on-write behaviour was disabled to achieve this.
  • How have other people solved this? Or am I being paranoid?

I can't seem to find a definitive answer so I hope you guys can help me..

Edit:

Wow, that's a whole lot of answers in such a short time. Thank you! I will definitely use Jack's solution when I want to disable COW. But now the main question becomes: do I really have to disable COW? Or is the 'bookkeeping' done for COW thread safe? I'm currently browsing the libstdc++ sources but that's going to take quite some time to figure out...

Edit 2

OK browsed the libstdc++ source code and I find something like this in libstd++-v3/include/bits/basic_string.h:

  _CharT*    _M_refcopy() throw()    { #ifndef _GLIBCXX_FULLY_DYNAMIC_STRING      if (__builtin_expect(this != &_S_empty_rep(), false)) #endif             __gnu_cxx::__atomic_add_dispatch(&this->_M_refcount, 1);      return _M_refdata();    }  // XXX MT 

So there is definitely something there about atomic changes to the reference counter...

Conclusion

I'm marking sellibitze's comment as answer here because I think we've reached the conclusion that this area is still unresolved for now. To circumvent the COW behaviour I'd suggest Jack Lloyd's answer. Thank you everybody for an interesting discussion!

like image 531
Benjamin Avatar asked Oct 20 '09 13:10

Benjamin


People also ask

Is std :: thread thread-safe?

Obviously, no STL data structure is thread-safe. But at least, with std::vector for example, you can simply use mutexes to protect access to the vector.

Is string the same as std :: string?

There is no functionality difference between string and std::string because they're the same type.

Is ++ thread-safe in C?

++ is not defined as thread-safe.

What is using std :: string?

C++ has in its definition a way to represent a sequence of characters as an object of the class. This class is called std:: string. String class stores the characters as a sequence of bytes with the functionality of allowing access to the single-byte character.


2 Answers

Threads are not yet part of the standard. But I don't think that any vendor can get away without making std::string thread-safe, nowadays. Note: There are different definitions of "thread-safe" and mine might differ from yours. Of course, it makes little sense to protect a container like std::vector for concurrent access by default even when you don't need it. That would go against the "don't pay for things you don't use" spirit of C++. The user should always be responsible for synchronization if he/she wants to share objects among different threads. The issue here is whether a library component uses and shares some hidden data structures that might lead to data races even if "functions are applied on different objects" from a user's perspective.

The C++0x draft (N2960) contains the section "data race avoidance" which basically says that library components may access shared data that is hidden from the user if and only if it activly avoids possible data races. It sounds like a copy-on-write implementation of std::basic_string must be as safe w.r.t. multi-threading as another implementation where internal data is never shared among different string instances.

I'm not 100% sure about whether libstdc++ takes care of it already. I think it does. To be sure, check out the documentation

like image 187
sellibitze Avatar answered Sep 26 '22 01:09

sellibitze


If you don't mind disabling copy-on-write, this may be the best course of action. std::string's COW only works if it knows that it is copying another std::string, so you can cause it to always allocate a new block of memory and make an actual copy. For instance this code:

#include <string> #include <cstdio>  int main()    {    std::string orig = "I'm the original!";    std::string copy_cow = orig;    std::string copy_mem = orig.c_str();    std::printf("%p %p %p\n", orig.data(),                              copy_cow.data(),                              copy_mem.data());    } 

will show that the second copy (using c_str) prevents COW. (Because the std::string only sees a bare const char*, and has no idea where it came from or what its lifetime might be, so it has to make a new private copy).

like image 42
Jack Lloyd Avatar answered Sep 25 '22 01:09

Jack Lloyd