Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Are list-initialized char arrays still null-terminated?

Tags:

c++

char

pointers

As I worked through the Lippman C++ Primer (5th ed, C++11), I came across this code:

char ca[] = {'C', '+', '+'};  //not null terminated
cout << strlen(ca) << endl;  //disaster: ca isn't null terminated

Calling the library strlen function on ca, which is not null-terminated, results in undefined behavior. Lippman et al say that "the most likely effect of this call is that strlen will keep looking through the memory that follows ca until it encounters a null character."

A later exercise asks what the following code does:

const char ca[] = {'h','e','l','l','o'};
const char *cp = ca;
while (*cp) {
   cout << *cp << endl;
   ++cp;
}

My analysis: ca is a char array that is not null-terminated. cp, a pointer to char, initially holds the address of ca[0]. The condition of the while loop dereferences pointer cp, contextually converts the resulting char value to bool, and executes the loop block only if the conversion results in 'true.' Since any non-null char converts to a bool value of 'true,' the loop block executes, incrementing the pointer by the size of a char. The loop then steps through memory, printing each char until a null character is reached. Since ca is not null-terminated, the loop may continue well past the address of ca[4], interpreting the contents of later memory addresses as chars and writing their values to cout, until it happens to come across a chunk of bits that happen to represent the null character (all 0's). This behavior would be similar to what Lippman et al suggested that strlen(ca) does in the earlier example.

However, when I actually execute the code (again compiling with g++ -std=c++11), the program consistently prints:

'h'
'e'
'l'
'l'
'o'

and terminates. Why?

like image 472
Chad Avatar asked Jun 21 '16 21:06

Chad


Video Answer


1 Answers

Most likely explanation: On modern desktop/server operating systems like windows and linux, memory is zeroed out before it is mapped into the address space of a program. So as long as the program doesn't use the adjacent memory locations for something else, it will look like a null terminated string. In your case, the adjacent bytes are probably just padding, as most variables are at least 4-Byte aligned.

As far as the language is concerned this is just one possible realization of undefined behavior.

like image 144
MikeMB Avatar answered Oct 26 '22 15:10

MikeMB