Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C++ small vs all caps datatype

Why in C++ (MSVS), datatypes with all caps are defined (and most of them are same)?

  1. These are exactly the same. Why all caps versions are defined?

    double and typedef double DOUBLE

    char and typedef char CHAR

  2. bool and BOOL (typedef int BOOL), here both all small and all caps represent Boolean states, why int is used in the latter?

What extra ability was gained through such additional datatypes?

like image 419
ItzMe Avatar asked Nov 29 '22 08:11

ItzMe


1 Answers

The ALLCAPS typedefs started in the very first days of Windows programming (1.0 and before). Back then, for example, there was no such thing as a bool type. The Windows APIs and headers were defined for old-school C. C++ didn't even exist back when they were being developed.

So to help document the APIs better, compiler macros like BOOL were introduced. Even though BOOL and INT were both macros for the same underlying type (int), this let you look at a function's type signature to see whether an argument or return value was intended as a boolean value (defined as "0 for false, any nonzero value for true") or an arbitrary integer.

As another example, consider LPCSTR. In 16-bit Windows, there were two kinds of pointers: near pointers were 16-bit pointers, and far pointers used both a 16-bit "segment" value and a 16-bit offset into that segment. The actual memory address was calculated in the hardware as ( segment << 4 ) + offset.

There were macros or typedefs for each of these kinds of pointers. NPSTR was a near pointer to a character string, and LPSTR was a far pointer to a character string. If it was a const string, then a C would get added in: NPCSTR or LPCSTR.

You could compile your code in either "small" model (using near pointers by default) or "large" model (using far pointers by default). The various NPxxx and LPxxx "types" would explicitly specify the pointer size, but you could also omit the L or N and just use PSTR or PCSTR to declare a writable or const pointer that matched your current compilation mode.

Most Windows API functions used far pointers, so you would generally see LPxxx pointers there.

BOOL vs. INT was not the only case where two names were synonyms for the same underlying type. Consider a case where you had a pointer to a single character, not a zero-terminated string of characters. There was a name for that too. You would use PCH for a pointer to a character to distinguish it from PSTR which pointed to a zero-terminated string.

Even though the underlying pointer type was exactly the same, this helped document the intent of your code. Of course there were all the same variations: PCCH for a pointer to a constant character, NPCH and LPCH for the explicit near and far, and of course NPCCH and LPCCH for near and far pointers to a constant character. Yes, the use of C in these names to represent both "const" and "char" was confusing!

When Windows moved to 32 bits with a "flat" memory model, there were no more near or far pointers, just flat 32-bit pointers for everything. But all of these type names were preserved to make it possible for old code to continue compiling, they were just all collapsed into one. So NPSTR, LPSTR, plain PSTR, and all the other variations mentioned above became synonyms for the same pointer type (with or without a const modifier).

Unicode came along around that same time, and most unfortunately, UTF-8 had not been invented yet. So Unicode support in Windows took the form of 8-bit characters for ANSI and 16-bit characters (UCS-2, later UTF-16) for Unicode. Yes, at that time, people thought 16-bit characters ought to be enough for anyone. How could there possibly be more than 65,536 different characters in the world?! (Famous last words...)

You can guess what happened here. Windows applications could be compiled in either ANSI or Unicode ("Wide character") mode, meaning that their default character pointers would be either 8-bit or 16-bit. You could use all of the type names above and they would match the mode your app was compiled in. Almost all Windows APIs that took string or character pointers came in both ANSI and Unicode versions, with an A or W suffix on the actual function name. For example, SetWindowText( HWND hwnd, LPCSTR lpString) became two functions: SetWindowTextA( HWND hwnd, LPCSTR lpString ) or SetWindowTextW( HWND hwnd, LPCWSTR lpString ). And SetWindowText itself became a macro defined as one or the other of those depending on whether you compiled for ANSI or Unicode.

Back then, you might have actually wanted to write your code so that it could be compiled either in ANSI or Unicode mode. So in addition to the macro-ized function name, there was also the question of whether to use "Howdy" or L"Howdy" for your window title. The TEXT() macro (more commonly known as _T() today) fixed this. You could write:

SetWindowText( hwnd, TEXT("Howdy") );

and it would compile to either of these depending on your compilation mode:

SetWindowTextA( hwnd, "Howdy" );

SetWindowTextW( hwnd, L"Howdy" );

Of course, most of this is moot today. Nearly everyone compiles their Windows apps in Unicode mode. That is the native mode on all modern versions of Windows, and the ...A versions of the API functions are shims/wrappers around the native Unicode ...W versions. By compiling for Unicode you avoid going through all those shim calls. But you still can compile your app in ANSI (or "multi-byte character set") mode if you want, so all of these macros still exist.

like image 124
Michael Geary Avatar answered Dec 01 '22 22:12

Michael Geary