Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

c_str() vs. data() when it comes to return type

After C++11, I thought of c_str() and data() equivalently.

C++17 introduces an overload for the latter, that returns a non-constant pointer (reference, which I am not sure if it's updated completely w.r.t. C++17):

const CharT* data() const;    (1)   
CharT* data();                (2)   (since C++17)

c_str() does only return a constant pointer:

const CharT* c_str() const;

Why the differentiation of these two methods in C++17, especially when C++11 was the one that made them homogeneous? In other words, why only the one method got an overload, while the other didn't?

like image 385
gsamaras Avatar asked Nov 27 '18 13:11

gsamaras


People also ask

What type does c_str return?

The function c_str() returns a const pointer to a regular C string, identical to the current string. The returned string is null-terminated. Note that since the returned pointer is of type (C/C++ Keywords) const, the character data that c_str() returns cannot be modified.

What is c_str () used for?

The c_str() method converts a string to an array of characters with a null character at the end. The function takes in no parameters and returns a pointer to this character array (also called a c-string).


2 Answers

The new overload was added by P0272R1 for C++17. Neither the paper itself nor the links therein discuss why only data was given new overloads but c_str was not. We can only speculate at this point (unless people involved in the discussion chime in), but I'd like to offer the following points for consideration:

  • Even just adding the overload to data broke some code; keeping this change conservative was a way to minimize negative impact.

  • The c_str function had so far been entirely identical to data and is effectively a "legacy" facility for interfacing code that takes "C string", i.e. an immutable, null-terminated char array. Since you can always replace c_str by data, there's no particular reason to add to this legacy interface.

I realize that the very motivation for P0292R1 was that there do exist legacy APIs that erroneously or for C reasons take only mutable pointers even though they don't mutate. All the same, I suppose we don't want to add more to string's already massive API that absolutely necessary.

One more point: as of C++17 you are now allowed to write to the null terminator, as long as you write the value zero. (Previously, it used to be UB to write anything to the null terminator.) A mutable c_str would create yet another entry point into this particular subtlety, and the fewer subtleties we have, the better.

like image 187
Kerrek SB Avatar answered Oct 13 '22 00:10

Kerrek SB


The reason why the data() member got an overload is explained in this paper at open-std.org.

TL;DR of the paper: The non-const .data() member function for std::string was added to improve uniformity in the standard library and to help C++ developers write correct code. It is also convenient when calling a C-library function that doesn't have const qualification on its C-string parameters.

Some relevant passages from the paper:

Abstract
Is std::string's lack of a non-const .data() member function an oversight or an intentional design based on pre-C++11 std::string semantics? In either case, this lack of functionality tempts developers to use unsafe alternatives in several legitimate scenarios. This paper argues for the addition of a non-const .data() member function for std::string to improve uniformity in the standard library and to help C++ developers write correct code.

Use Cases
C libraries occasionally include routines that have char * parameters. One example is the lpCommandLine parameter of the CreateProcess function in the Windows API. Because the data() member of std::string is const, it cannot be used to make std::string objects work with the lpCommandLine parameter. Developers are tempted to use .front() instead, as in the following example.

std::string programName;
// ...
if( CreateProcess( NULL, &programName.front(), /* etc. */ ) ) {
  // etc.
} else {
  // handle error
}

Note that when programName is empty, the programName.front() expression causes undefined behavior. A temporary empty C-string fixes the bug.

std::string programName;
// ...

if( !programName.empty() ) { 
  char emptyString[] = {'\0'};    
  if( CreateProcess( NULL, programName.empty() ? emptyString : &programName.front(), /* etc. */ ) ) {
    // etc.
  } else {
    // handle error
  }
}

If there were a non-const .data() member, as there is with std::vector, the correct code would be straightforward.

std::string programName;
// ...
if( !programName.empty() ) {
  char emptyString[] = {'\0'};
  if( CreateProcess( NULL, programName.data(), /* etc. */ ) ) {
    // etc.
  } else {
    // handle error
  }
}

A non-const .data() std::string member function is also convenient when calling a C-library function that doesn't have const qualification on its C-string parameters. This is common in older codes and those that need to be portable with older C compilers.

like image 22
P.W Avatar answered Oct 12 '22 22:10

P.W