I would like to write the following function:
int char_index(char c)
{
if (is_ascii<char>)
return c - 'A';
else
return c == 'A' ? 0 :
c == 'B' ? 1 :
// ...
}
Is there a function like is_ascii
in std
? I'm imagining something like std::numeric_limits<T>::is_iec559
which says whether some floating point type T
satisfies the requirements of the IEE 754 standard.
I think I can implement is_ascii
myself with something like if (65 == 'A' && ...)
that enumerates the entire ASCII charset, and compares them to the int
representation, but that's annoying. Also, I'm not sure how to check non-printable characters like SOH
(Start Of Heading), etc.
Is it even possible to write this function in user code, or do I have to rely on the implementation to provide such a function?
I assume that you want to check if your compiler when translating string literals and character literals in your source code to machine code uses ascii encoding.
Is there a function like is_ascii in std?
Not that I know of.
I can implement is_ascii myself with something like if (65 == 'A' && ...) that enumerates the entire ASCII charset
So do that. Check characters that can be a c-char, so all from basic source character set:
a b c d e f g h i j k l m n o p q r s t u v w x y z
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
0 1 2 3 4 5 6 7 8 9
_ { } [ ] # ( ) < > % : ; . ? * + - / ^ & | ~ ! = , \ " '
and escape sequences:
\a \b \f \n \r \t \v
There's no way to check "entire" ASCII charset, because the compiler doesn't transcribe the program to the entire ASCII charset. It only maps basic character set characters and escape sequences to it's machine representation, not the whole charset (there may be compiler extensions).
but that's annoying.
But that's the only way. To verify your implementation uses some character set you have to check all characters it uses. So check them. It's going to be consteval
anyway.
how to check non-printable characters like SOH (Start Of Heading), etc.
Don't. SOH character can't be inside a character literal, you don't have to check them, because it's not possible to express it in C language. There is no \SOH
escape sequence, 0x01
byte is not inside basic character set. Your compiler never translates a sequence of characters to SOH
character. A valid program will be composed only from character from basic source character set. The interpretation of the SOH
character is up to the thing that is going to receive it and if I write '\001'
it's going to be byte equal to 1
irrelevant of the encoding.
Meh, let's write it! The following program:
#include <type_traits>
#include <algorithm>
constexpr bool compiler_uses_ascii() {
return
'\a'==0x07 && '\b'==0x08 && '\t'==0x09 && '\n'==0x0a && '\v'==0x0b && '\f'==0x0c &&
'\r'==0x0d && '!'==0x21 && '#'==0x23 && '%'==0x25 && '&'==0x26 && '\''==0x27 &&
'('==0x28 && ')'==0x29 && '*'==0x2a && '+'==0x2b && ','==0x2c && '-'==0x2d &&
'.'==0x2e && '/'==0x2f && '0'==0x30 && '1'==0x31 && '2'==0x32 && '3'==0x33 &&
'4'==0x34 && '5'==0x35 && '6'==0x36 && '7'==0x37 && '8'==0x38 && '9'==0x39 &&
':'==0x3a && ';'==0x3b && '<'==0x3c && '='==0x3d && '>'==0x3e && '?'==0x3f &&
'A'==0x41 && 'B'==0x42 && 'C'==0x43 && 'D'==0x44 && 'E'==0x45 && 'F'==0x46 &&
'G'==0x47 && 'H'==0x48 && 'I'==0x49 && 'J'==0x4a && 'K'==0x4b && 'L'==0x4c &&
'M'==0x4d && 'N'==0x4e && 'O'==0x4f && 'P'==0x50 && 'Q'==0x51 && 'R'==0x52 &&
'S'==0x53 && 'T'==0x54 && 'U'==0x55 && 'V'==0x56 && 'W'==0x57 && 'X'==0x58 &&
'Y'==0x59 && 'Z'==0x5a && '['==0x5b && '\\'==0x5c && ']'==0x5d && '^'==0x5e &&
'_'==0x5f && 'a'==0x61 && 'b'==0x62 && 'c'==0x63 && 'd'==0x64 && 'e'==0x65 &&
'f'==0x66 && 'g'==0x67 && 'h'==0x68 && 'i'==0x69 && 'j'==0x6a && 'k'==0x6b &&
'l'==0x6c && 'm'==0x6d && 'n'==0x6e && 'o'==0x6f && 'p'==0x70 && 'q'==0x71 &&
'r'==0x72 && 's'==0x73 && 't'==0x74 && 'u'==0x75 && 'v'==0x76 && 'w'==0x77 &&
'x'==0x78 && 'y'==0x79 && 'z'==0x7a && '{'==0x7b && '|'==0x7c && '}'==0x7d &&
'~'==0x7e;
}
constexpr int char_index(char c)
{
if constexpr (compiler_uses_ascii()) {
return c - 'A';
} else {
// Is that right? Maybe it is.
const char a[] = "ABCDEFGHIJKLMNOPRSTUVXYZ";
return std::find(a, a + sizeof(a), c) - a;
#if 0
return
c == 'A' ? 0 : c == 'B' ? 1 : c == 'C' ? 2 : c == 'D' ? 3 :
c == 'E' ? 4 : c == 'F' ? 5 : c == 'G' ? 6 : c == 'H' ? 7 :
c == 'I' ? 8 : c == 'J' ? 9 : c == 'K' ? 10 : c == 'L' ? 11 :
c == 'M' ? 12 : c == 'N' ? 13 : c == 'O' ? 14 : c == 'P' ? 15 :
c == 'Q' ? 16 : c == 'R' ? 17 : c == 'S' ? 18 : c == 'T' ? 19 :
c == 'U' ? 20 : c == 'V' ? 21 : c == 'W' ? 22 : c == 'X' ? 23 :
c == 'Y' ? 24 : c == 'Z' ? 25 : -1;
#endif
}
}
#include <iostream>
int main() {
std::cout << compiler_uses_ascii() << " " << char_index('B') << "\n";
}
when executed outputs:
$ g++ 1.cpp -std=c++20 && ./a.out
1 1
$ g++ 1.cpp -fexec-charset=IBM-1047 -std=c++20 && ./a.out
0@1%
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With