Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why are so many string types in C++?

I like C, I have a C book called Full and Full C and 2 C++, I find these languages fantastic because of their incredible power and performance, but I have to end many of my projects because of these various types.

I say to have std::string, LPCSTR, System::String, TCHAR [], char s [] = "ss", char * s?

This causes tremendous headaches mainly in GUI applications, WinAPI has the problem of LPCSTR not being compatible with char or std::string, and now in CLR applications if it has System::String that gives a lot of headache to convert to std::String or char * s or even char s [].

Why don’t C/C++ have its string type unique like String in Java?

like image 622
Samuel Ives Avatar asked Dec 10 '22 10:12

Samuel Ives


2 Answers

There are no "many types of string in c++". Canonically there is one template std::basic_string, which is basically a container specialized for strings of different character types.

std::string is a convenience typedef onto std::basic_string<char>. There are more such typedefs for different underlying character types.

AFAIK, standard c has also only one officially recognized string standard. It's ANSI-string i.e. a null terminated array of char.

All other you mention are either equivalent of this (e.g. LPCSTR is a long pointer to a constant string i.e. const char*), or some non-standard extensions written by library providers.

Your question is like asking why there are so many GUI libraries. Because there is no standard way to do this, or standard way is lacking in some way, and it was a design decision to provide and support own equivalent type.

Bottom line is, that on the library level, or language level, it's a design decision between different trade-offs. Simplicity, performance, character support, etc. etc. In general, storing text is hard.

like image 85
luk32 Avatar answered Dec 25 '22 09:12

luk32


Well, first we must answer the question: What is a string?

The C-standard defines it as a contiguous sequence of characters terminated by and including the first null character.1
It also mentions varieties using wchar_t, char16_t, or char32_t instead of char.
It also provides many functions for string-manipulation, and string-literals for notational convenience.

So, a sequence of characters can be a string, a char[] might hold a string, and a char* might point to one.
LPCSTR is a windows typedef for const char* with the added semantics that it should point to a string or be NULL.
TCHAR is one of a number of preprocessor-defines used for transitioning windows code from char to wchar_t. Depending on what TCHAR is, a TCHAR[] might be able to hold a string, or a wide-string.


C++ mixes up things a bit because it adds a data-type for handling strings. To reduce ambiguity, string is only used for the abstract concept, you have to rely on the context to disambiguate or be more explicit.

So the C string corresponds with the C++ null-terminated-byte-string, or NTBS.2
Yes, C++ also knows their wide varieties.
And C++ incorporates the C functions and adds some more.
In addition, C++ has std::basic_string<> for storing all kinds of counted strings, and some convenience-typedefs like std::string.


And now we get to the third language yet, namely C++/CLI.
Which incorporates all I spoke above from C++, and adds the CLI type System::String into the mix.
System::String is an immutable UTF-16 counted-string.


Now to answer the question why C++ does not define one single concrete type to be a string can be answered:

There are different types of string in C++ for interoperability, history, efficiency and convenience. Always use the right tool for the job.
Java and .Net do the same with byte-arrays, char-arrays, string-builders and the like.


Reference 1: C11 final draft, definition of string:

7. Library

7.1 Introduction

7.1.1 Definitions of terms

1 A string is a contiguous sequence of characters terminated by and including the first null character. The term multibyte string is sometimes used instead to emphasize special processing given to multibyte characters contained in the string or to avoid confusion with a wide string. A pointer to a string is a pointer to its initial (lowest addressed) character. The length of a string is the number of bytes preceding the null character and the value of a string is the sequence of the values of the contained characters, in order.

Reference 2: C++1z draft n4659 NTBS:

20.4.2.1.5.1 Byte strings [byte.strings]

1 A null-terminated byte string, or NTBS, is a character sequence whose highest-addressed element with defined content has the value zero (the terminating null character); no other element in the sequence has the value zero.163
2 The length of an NTBS is the number of elements that precede the terminating null character. An empty ntbs has a length of zero.
3 The value of an NTBS is the sequence of values of the elements up to and including the terminating null character.
4 A static NTBS is an NTBS with static storage duration.164

like image 21
Deduplicator Avatar answered Dec 25 '22 09:12

Deduplicator