After C++11, I thought of <code>c_str()</code> and <code>data()</code> equivalently. C++17 introduces an overload for the latter, that returns a non-constant pointer (reference, which I am not sure if it's updated completely w.r.t. C++17): <pre class="prettyprint"><code>const CharT* data() const; (1) CharT* data(); (2) (since C++17) </code></pre> <code>c_str()</code> does only return a constant pointer: <pre class="prettyprint"><code>const CharT* c_str() const; </code></pre> Why the differentiation of these two methods in C++17, especially when C++11 was the one that made them homogeneous? In other words, why only the one method got an overload, while the other didn't?

The new overload was added by P0272R1 for C++17. Neither the paper itself nor the links therein discuss why only <code>data</code> was given new overloads but <code>c_str</code> was not. We can only speculate at this point (unless people involved in the discussion chime in), but I'd like to offer the following points for consideration: <ul> <li>Even just adding the overload to <code>data</code> broke some code; keeping this change conservative was a way to minimize negative impact.</li> <li>The <code>c_str</code> function had so far been entirely identical to <code>data</code> and is effectively a "legacy" facility for interfacing code that takes "C string", i.e. an immutable, null-terminated char array. Since you can always replace <code>c_str</code> by <code>data</code>, there's no particular reason to add to this legacy interface.</li> </ul> I realize that the very motivation for P0292R1 was that there do exist legacy APIs that erroneously or for C reasons take only mutable pointers even though they don't mutate. All the same, I suppose we don't want to add more to string's already massive API that absolutely necessary. One more point: as of C++17 you are now allowed to write to the null terminator, as long as you write the value zero. (Previously, it used to be UB to write anything to the null terminator.) A mutable <code>c_str</code> would create yet another entry point into this particular subtlety, and the fewer subtleties we have, the better.

The reason why the <code>data()</code> member got an overload is explained in this paper at open-std.org. TL;DR of the paper: The non-const <code>.data()</code> member function for <code>std::string</code> was added to improve uniformity in the standard library and to help C++ developers write correct code. It is also convenient when calling a C-library function that doesn't have const qualification on its C-string parameters. Some relevant passages from the paper: <blockquote> Abstract Is <code>std::string</code>'s lack of a non-const <code>.data()</code> member function an oversight or an intentional design based on pre-C++11 <code>std::string</code> semantics? In either case, this lack of functionality tempts developers to use unsafe alternatives in several legitimate scenarios. This paper argues for the addition of a non-const <code>.data()</code> member function for std::string to improve uniformity in the standard library and to help C++ developers write correct code. Use Cases C libraries occasionally include routines that have char * parameters. One example is the <code>lpCommandLine</code> parameter of the <code>CreateProcess</code> function in the Windows API. Because the <code>data()</code> member of <code>std::string</code> is const, it cannot be used to make std::string objects work with the <code>lpCommandLine</code> parameter. Developers are tempted to use <code>.front()</code> instead, as in the following example. <pre class="prettyprint"><code>std::string programName; // ... if( CreateProcess( NULL, &programName.front(), /* etc. */ ) ) { // etc. } else { // handle error } </code></pre> Note that when <code>programName</code> is empty, the <code>programName.front()</code> expression causes undefined behavior. A temporary empty C-string fixes the bug. <pre class="prettyprint"><code>std::string programName; // ... if( !programName.empty() ) { char emptyString[] = {'\0'}; if( CreateProcess( NULL, programName.empty() ? emptyString : &programName.front(), /* etc. */ ) ) { // etc. } else { // handle error } } </code></pre> If there were a non-const <code>.data()</code> member, as there is with <code>std::vector</code>, the correct code would be straightforward. <pre class="prettyprint"><code>std::string programName; // ... if( !programName.empty() ) { char emptyString[] = {'\0'}; if( CreateProcess( NULL, programName.data(), /* etc. */ ) ) { // etc. } else { // handle error } } </code></pre> A non-const <code>.data() std::string</code> member function is also convenient when calling a C-library function that doesn't have const qualification on its C-string parameters. This is common in older codes and those that need to be portable with older C compilers. </blockquote>

c_str() vs. data() when it comes to return type

Tags:

c++

string

c++17

c-str

After C++11, I thought of c_str() and data() equivalently.

C++17 introduces an overload for the latter, that returns a non-constant pointer (reference, which I am not sure if it's updated completely w.r.t. C++17):

const CharT* data() const;    (1)   
CharT* data();                (2)   (since C++17)

c_str() does only return a constant pointer:

const CharT* c_str() const;

Why the differentiation of these two methods in C++17, especially when C++11 was the one that made them homogeneous? In other words, why only the one method got an overload, while the other didn't?

385

asked Nov 27 '18 13:11

gsamaras

2 Answers

The new overload was added by P0272R1 for C++17. Neither the paper itself nor the links therein discuss why only data was given new overloads but c_str was not. We can only speculate at this point (unless people involved in the discussion chime in), but I'd like to offer the following points for consideration:

Even just adding the overload to data broke some code; keeping this change conservative was a way to minimize negative impact.
The c_str function had so far been entirely identical to data and is effectively a "legacy" facility for interfacing code that takes "C string", i.e. an immutable, null-terminated char array. Since you can always replace c_str by data, there's no particular reason to add to this legacy interface.

I realize that the very motivation for P0292R1 was that there do exist legacy APIs that erroneously or for C reasons take only mutable pointers even though they don't mutate. All the same, I suppose we don't want to add more to string's already massive API that absolutely necessary.

One more point: as of C++17 you are now allowed to write to the null terminator, as long as you write the value zero. (Previously, it used to be UB to write anything to the null terminator.) A mutable c_str would create yet another entry point into this particular subtlety, and the fewer subtleties we have, the better.

187

answered Oct 13 '22 00:10

Kerrek SB

The reason why the data() member got an overload is explained in this paper at open-std.org.

TL;DR of the paper: The non-const .data() member function for std::string was added to improve uniformity in the standard library and to help C++ developers write correct code. It is also convenient when calling a C-library function that doesn't have const qualification on its C-string parameters.

Some relevant passages from the paper:

Abstract
Is std::string's lack of a non-const .data() member function an oversight or an intentional design based on pre-C++11 std::string semantics? In either case, this lack of functionality tempts developers to use unsafe alternatives in several legitimate scenarios. This paper argues for the addition of a non-const .data() member function for std::string to improve uniformity in the standard library and to help C++ developers write correct code.

Use Cases
C libraries occasionally include routines that have char * parameters. One example is the lpCommandLine parameter of the CreateProcess function in the Windows API. Because the data() member of std::string is const, it cannot be used to make std::string objects work with the lpCommandLine parameter. Developers are tempted to use .front() instead, as in the following example.
std::string programName;
// ...
if( CreateProcess( NULL, &programName.front(), /* etc. */ ) ) {
  // etc.
} else {
  // handle error
}
Note that when programName is empty, the programName.front() expression causes undefined behavior. A temporary empty C-string fixes the bug.
std::string programName;
// ...

if( !programName.empty() ) { 
  char emptyString[] = {'\0'};    
  if( CreateProcess( NULL, programName.empty() ? emptyString : &programName.front(), /* etc. */ ) ) {
    // etc.
  } else {
    // handle error
  }
}
If there were a non-const .data() member, as there is with std::vector, the correct code would be straightforward.
std::string programName;
// ...
if( !programName.empty() ) {
  char emptyString[] = {'\0'};
  if( CreateProcess( NULL, programName.data(), /* etc. */ ) ) {
    // etc.
  } else {
    // handle error
  }
}
A non-const .data() std::string member function is also convenient when calling a C-library function that doesn't have const qualification on its C-string parameters. This is common in older codes and those that need to be portable with older C compilers.

answered Oct 12 '22 22:10

P.W

Related questions
                            
                                Why are non-placement `new` and `delete` built into the language and not just regular functions?
                            
                                private static member function or free function in anonymous namespace?
                            
                                How to enable /std:c++17 in VS2017 with CMake
                            
                                Understanding "corrupted size vs. prev_size" glibc error
                            
                                How to design a C++ API for binary compatible extensibility
                            
                                Is it wrong to dereference a pointer to get a reference?
                            
                                CMake and MsVS-NuGet
                            
                                what's polymorphic type in C++?
                            
                                Eclipse compiles successfully but still gives semantic errors
                            
                                MSBuild vs devenv for command line builds
                            
                                Iterating over non-incremental Enum
                            
                                while(true); loop throws Unreachable code when isn't in a void
                            
                                What's the difference between raw pointer and weak_ptr?
                            
                                How do I use C++ modules in Clang?
                            
                                What is wrong with using inline functions?
                            
                                operator std::string() const?
                            
                                Switch Statements with strongly typed enumerations
                            
                                std::map thread-safety
                            
                                Why does numeric_limits::min return a negative value for int but positive values for float/double?
                            
                                How to use boost::optional

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With