Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why aren't string literals passed as references to arrays instead of opaque pointers?

In C++, the type of string literals is const char [N], where N, as std::size_t, is the number of characters plus one (the zero-byte terminator). They reside in static storage and are available from program initialization to termination.

Often, functions taking a constant string doesn't need the interface of std::basic_string or would prefer to avoid dynamic allocation; they may just need, for instance, the string itself and its length. std::basic_string, particularly, has to offer a way to be constructed from the language's native string literals. Such functions offer a variant that takes a C-style string:

void function_that_takes_a_constant_string ( const char * /*const*/ s );

// Array-to-pointer decay happens, and takes away the string's length
function_that_takes_a_constant_string( "Hello, World!" );

As explained in this answer, arrays decay to pointers, but their dimensions are taken away. In the case of string literals, this means that their length, which was known at compile-time, is lost and must be recalculated at runtime by iterating through the pointed memory until a zero-byte is found. This is not optimal.

However, string literals, and, in general, arrays, may be passed as references using template parameter deduction to keep their size:

template<std::size_t N>
void function_that_takes_a_constant_string ( const char (& s)[N] );

// Transparent, and the string's length is kept
function_that_takes_a_constant_string( "Hello, World!" );

The template function could serve as a proxy to another function, the real one, which would take a pointer to the string and its length, so that code exposure was avoided and the length was kept.

// Calling the wrapped function directly would be cumbersome.
// This wrapper is transparent and preserves the string's length.
template<std::size_t N> inline auto
function_that_takes_a_constant_string
( const char (& s)[N] )
{
    // `s` decays to a pointer
    // `N-1` is the length of the string
    return function_that_takes_a_constant_string_private_impl( s , N-1 );
}

// Isn't everyone happy now?
function_that_takes_a_constant_string( "Hello, World!" );

Why isn't this used more broadly? In particular, why doesn't std::basic_string have a constructor with the proposed signature?


Note: I don't know how the proposed parameter is named; if you know how, please, suggest an edition to the question's title.

like image 475
Kalrish Avatar asked Jan 10 '23 21:01

Kalrish


2 Answers

It's largely historical, in a sense. While you're correct that there's no real reason this can't be done (if you don't want to use your whole buffer, pass a length argument, right?) it's still true that if you have a character array it's usually a buffer not all of which you're using at any one time:

char buf[MAX_LEN];

Since this is usually how they're used, it seems needless or even risky to go to the trouble of adding a new basic_string constructor template for const CharT (&)[N].

The whole thing is pretty borderline though.

like image 142
Lightness Races in Orbit Avatar answered Jan 13 '23 16:01

Lightness Races in Orbit


The trouble with adding such a templated overload is simple:

It would be used whenever the function is called with a static buffer of char-type, even if the buffer is not as a whole a string, and you really wanted to pass only the initial string (embedded zeroes are far less common than terminating zeroes, and using part of a buffer is very common): Current code rarely contains explicit decay from array to pointer to first element, using a cast or function-call.

Demo-code (On coliru):

#include <stdio.h>
#include <string.h>

auto f(const char* s, size_t n) {
    printf("char* size_t %u\n", (unsigned)n);
    (void)s;
}
auto f(const char* s) {
    printf("char*\n");
    return f(s, strlen(s));
}
template<size_t N> inline auto
f( const char (& s)[N] ) {
    printf("char[&u]\n");
    return f(s, N-1);
}

int main() {
    char buffer[] = "Hello World";
    f(buffer);
    f(+buffer);
    buffer[5] = 0;
    f(buffer);
    f(+buffer);
}

Keep in mind: If you talk about a string in C, it always denotes a 0-terminated string, while in C++ it can also denote a std::string, which is counted.

like image 36
Deduplicator Avatar answered Jan 13 '23 17:01

Deduplicator