Character encodings Null-terminated strings require that the encoding does not use a zero byte (0x00) anywhere; therefore it is not possible to store every possible ASCII or UTF-8 string. However, it is common to store the subset of ASCII or UTF-8 – every character except NUL – in null-terminated strings.
The null character indicates the end of the string. Such strings are called null-terminated strings. The null terminator of a multibyte string consists of one byte whose value is 0. The null terminator of a wide-character string consists of one gl_wchar_t character whose value is 0.
No, but if you say temp. c_str() a null terminator will be included in the return from this method. It's also worth saying that you can include a null character in a string just like any other character.
From the horse's mouth
None of BCPL, B, or C supports character data strongly in the language; each treats strings much like vectors of integers and supplements general rules by a few conventions. In both BCPL and B a string literal denotes the address of a static area initialized with the characters of the string, packed into cells. In BCPL, the first packed byte contains the number of characters in the string; in B, there is no count and strings are terminated by a special character, which B spelled
*e
. This change was made partially to avoid the limitation on the length of a string caused by holding the count in an 8- or 9-bit slot, and partly because maintaining the count seemed, in our experience, less convenient than using a terminator.
Dennis M Ritchie, Development of the C Language
C doesn't have a string as part of the language. A 'string' in C is just a pointer to char. So maybe you're asking the wrong question.
"What's the rationale for leaving out a string type" might be more relevant. To that I would point out that C is not an object oriented language and only has basic value types. A string is a higher level concept that has to be implemented by in some way combining values of other types. C is at a lower level of abstraction.
I just want to point out that I'm not trying to say this is a stupid or bad question, or that the C way of representing strings is the best choice. I'm trying to clarify that the question would be more succinctly put if you take into account the fact that C has no mechanism for differentiating a string as a datatype from a byte array. Is this the best choice in light of the processing and memory power of todays computers? Probably not. But hindsight is always 20/20 and all that :)
The question is asked as a Length Prefixed Strings (LPS)
vs zero terminated strings (SZ)
thing, but mostly expose benefits of length prefixed strings. That may seem overwhelming, but to be honest we should also consider drawbacks of LPS and advantages of SZ.
As I understand it, the question may even be understood as a biased way to ask "what are the advantages of Zero Terminated Strings ?".
Advantages (I see) of Zero Terminated Strings:
"this\0is\0valid\0C"
.
Is it a string ? or four strings ? Or a bunch of bytes...char a[3] =
"foo";
is valid C (not C++) and
won't put a final zero in a.char*
. Namely
not to return the address of the string, but instead to return the actual data.That said, no need to complain in the rare case where standard C strings are indeed inefficient. Libs are available. If I followed that trend, I should complain that standard C does not include any regex support functions... but really everybody knows it's not a real problem as there is libraries available for that purpose. So when string manipulation efficiency is wanted, why not use a library like bstring ? Or even C++ strings ?
EDIT: I recently had a look to D strings. It is interesting enough to see that the solution choosed is neither a size prefix, nor zero termination. As in C, literal strings enclosed in double quotes are just short hand for immutable char arrays, and the language also has a string keyword meaning that (immutable char array).
But D arrays are much richer than C arrays. In the case of static arrays length is known at run-time so there is no need to store the length. Compiler has it at compile time. In the case of dynamic arrays, length is available but D documentation does not state where it is kept. For all we know, compiler could choose to keep it in some register, or in some variable stored far away from the characters data.
On normal char arrays or non literal strings there is no final zero, hence programmer has to put it itself if he wants to call some C function from D. In the particular case of literal strings, however the D compiler still put a zero at the end of each strings (to allow easy cast to C strings to make easier calling C function ?), but this zero is not part of the string (D does not count it in string size).
The only thing that disappointed me somewhat is that strings are supposed to be utf-8, but length apparently still returns a number of bytes (at least it's true on my compiler gdc) even when using multi-byte chars. It is unclear to me if it's a compiler bug or by purpose. (OK, I probably have found out what happened. To say to D compiler your source use utf-8 you have to put some stupid byte order mark at beginning. I write stupid because I know of not editor doing that, especially for UTF-8 that is supposed to be ASCII compatible).
I think, it has historical reasons and found this in wikipedia:
At the time C (and the languages that it was derived from) were developed, memory was extremely limited, so using only one byte of overhead to store the length of a string was attractive. The only popular alternative at that time, usually called a "Pascal string" (though also used by early versions of BASIC), used a leading byte to store the length of the string. This allows the string to contain NUL and made finding the length need only one memory access (O(1) (constant) time). But one byte limits the length to 255. This length limitation was far more restrictive than the problems with the C string, so the C string in general won out.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With