Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Which string classes to use in C++?

we have a multi-threaded desktop application in C++ (MFC). Currently developers use either CString or std::string, probably depending on their mood. So we'd like to choose a single implementation (probably something other than those two).

MFC's CString is based on copy-on-write (COW) idiom, and some people would claim this is unacceptable in a multithreaded environment (and probably reference to this article). I am not convinced by such claims, as atomic counters seem to be quite fast, and also this overhead is somehow compensated by a reduction in memory re-allocations.

I learned that std::string implementation depends on compiler - it is not COW in MSVC but it is, or was in gcc. As far as I understood, the new C++0x standard is going to fix this by requiring a non-COW implementation and resolve some other issues, such as contiguous buffer requirements. So actually std::string looks not well defined at this point...

A quick example of what I don't like about std::string: no way to return a string from a function without excessive re-allocations (copy constructor if return by value, and no access to internal buffer to optimize that so "return by reference" e.g. std::string& Result doesn't help). I can do this with CString by either returning by value (no copy due to COW) or passing by reference and accessing the buffer directly. Again, C++0x to the rescue with its rvalue references, but we are not going to have C++0x in the nearest feature.

Which string class should we use? Can COW really become an issue? Are there other commonly used efficient implementations of strings? Thanks.

EDIT: We don't use unicode at the moment, and it is unlikely that we will need it. However, if there is something easily supporting unicode (not at the cost of ICU...), that would be a plus.

like image 753
Roman L Avatar asked Jan 17 '11 14:01

Roman L


1 Answers

I would use std::string.

  • Promote decoupling from MFC
  • Better interaction with existing C++ libraries

The "return by value" issue is mostly a non-issue. Compilers are very good at performing Return Value Optimization (RVO) which actually eliminates the copy in most cases when returning by value. If it doesn't, you can usually tweak the function.

COW has been rejected for a reason: it doesn't scale (well) and the so-hoped-for increase in speed has not been really measured (see Herb Sutter's article). Atomic operations are not as cheap as they appear. With mono-processor mono-core it was easy, but now multi-core are commodity and multi-processors are widely available (for servers). In such distributed architectures there are multiple caches, that need be synchronized, and the more distributed the architecture, the more costly the atomic operations.

Does CString implement Small String Optimization ? It's a simple trick that allows a string not to allocate any memory for small strings (usually a few characters). Very useful because it turns out that most strings are in fact small, how many strings in your application are less than 8-characters long ?

So, unless you present me a real benchmark which clearly shows a net gain in using CString, I'd prefer sticking with the standard: it's standard, and likely better optimized.

like image 109
Matthieu M. Avatar answered Oct 08 '22 19:10

Matthieu M.