Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

std::string in a multi-threaded program

Tags:

c++

stl

Given that:

1) The C++03 standard does not address the existence of threads in any way

2) The C++03 standard leaves it up to implementations to decide whether std::string should use Copy-on-Write semantics in its copy-constructor

3) Copy-on-Write semantics often lead to unpredictable behavior in a multi-threaded program

I come to the following, seemingly controversial, conclusion:

You simply cannot safely and portably use std::string in a multi-threaded program

Obviously, no STL data structure is thread-safe. But at least, with std::vector for example, you can simply use mutexes to protect access to the vector. With an std::string implementation that uses COW, you can't even reliably do that without editing the reference counting semantics deep within the vendor implementation.

Real-world example:

In my company, we have a multi-threaded application which has been thoroughly unit-tested and run through Valgrind countless times. The application ran for months with no problems whatsoever. One day, I recompile the application on another version of gcc, and all of a sudden I get random segfaults all the time. Valgrind is now reporting invalid memory accesses deep within libstdc++, in the std::string copy constructor.

So what is the solution? Well, of course, I could typedef std::vector<char> as a string class - but really, that sucks. I could also wait for C++0x, which I pray will require implementors to forgo COW. Or, (shudder), I could use a custom string class. I personally always rail against developers who implement their own classes when a preexisting library will do fine, but honestly, I need a string class which I can be sure is not using COW semantics; and std::string simply doesn't guarantee that.

Am I right that std::string simply cannot be used reliably at all in portable, multi-threaded programs? And what is a good workaround?

like image 466
Charles Salvia Avatar asked Nov 02 '09 12:11

Charles Salvia


People also ask

Can string be used in multithreading?

With this program you can see that String is immutable so original String won't be changed but String reference can still be changed with multiple threads. So Java Strings are thread safe here means when the shared String is changed it creates a new copy for another thread that way original String remains unchanged.

Does C++ support multithreading?

Starting with C++11 C++ has classes for multithreading support. The class you might be interested in most is std::thread . There are also classes for synchronization like std::mutex .

What does std :: String () do?

std::string class in C++ C++ has in its definition a way to represent a sequence of characters as an object of the class. This class is called std:: string. String class stores the characters as a sequence of bytes with the functionality of allowing access to the single-byte character.


1 Answers

You cannot safely and portably do anything in a multi-threaded program. There is no such thing as a portable multi-threaded C++ program, precisely because threads throw everything C++ says about order of operations, and the results of modifying any variable, out the window.

There's also nothing in the standard to guarantee that vector can be used in the way you say. It would be legal to provide a C++ implementation with a threading extension in which, say, any use of a vector outside the thread in which it was initialized results in undefined behavior. The instant you start a second thread, you aren't using standard C++ any more, and you must look to your compiler vendor for what is safe and what is not.

If your vendor provides a threading extension, and also provides a std::string with COW that (therefore) cannot be made thread-safe, then I think for the time being your argument is with your vendor, or with the threading extension, not with the C++ standard. For example, arguably POSIX should have barred COW strings in programs which use pthreads.

You could possibly make it safe by having a single mutex, which you take while doing any string mutation whatsoever, and any reads of a string that's the result of a copy. But you'd probably get crippling contention on that mutex.

like image 67
2 revs Avatar answered Oct 01 '22 12:10

2 revs