A very easy (and kind of elegant) way how to convert a lower-case letter-containing char
into an int
is to do the following:
int convertLowercaseCharLettertoInt(char letter) {
return letter - 'a';
}
However, this code assumes that the char
encoding follows the same ordering as the alphabet. Or, more generally, it assumes that char
follows the ASCII encoding.
char
is UTF-16 while C char
is ASCII. Although UTF-16 is not backward-compatible with ASCII, the ordering of the first 128 letters is the same in both. So is the ordering of the first 128 char
s the same in all major languages such as C, C++, Java, C#, JavaScript and Python?
switch
statement approaches? The hash-map approach is, I think, the most elegant way how to solve this problem in the case of non-English alphabets. E.g. the Czech alphabet goes: a, á, b, c, č, d, ď, e, é, ě, f, g, h, ch, i, í, j, k, l, m, n, ň, o, ó, p, q, r, ř, s, š, t, ť, u, ú, ů, v, w, x, y, ý, z, ž.No. ASCII are standard in every language and in every embedded systems that are working.
UTF-8 supports any unicode character, which pragmatically means any natural language (Coptic, Sinhala, Phonecian, Cherokee etc), as well as many non-spoken languages (Music notation, mathematical symbols, APL).
Java actually uses Unicode, which includes ASCII and other characters from languages around the world.
The native character encoding of the Java programming language is UTF-16. A charset in the Java platform therefore defines a mapping between sequences of sixteen-bit UTF-16 code units (that is, sequences of chars) and sequences of bytes.
This has less to do with programming language, but more about the system's underlying character set. ASCII and all variants of Unicode will behave as you expect. 'a'...'z' are 26 consecutive code points. EBCDIC will not, so your trick will fail on an IBM/360 in most languages.
Java (and Python, and perhaps other) languages mandate Unicode encoding regardless of the underlying platform, so your trick will work there as well, assuming you can find a conforming Java implementation for your IBM mainframe.
In C, the compiler could detect problems
#if 'a'+1=='b' && 'b'+1=='c' && 'c'+1=='d' && 'd'+1=='e' && 'e'+1=='f' \
&& 'f'+1=='g' && 'g'+1=='h' && 'h'+1=='i' && 'i'+1=='j' && 'j'+1=='k'\
&& 'k'+1=='l' && 'l'+1=='m' && 'm'+1=='n' && 'n'+1=='o' && 'o'+1=='p'\
&& 'p'+1=='q' && 'q'+1=='r' && 'r'+1=='s' && 's'+1=='t' && 't'+1=='u'\
&& 'u'+1=='v' && 'v'+1=='w' && 'w'+1=='x' && 'x'+1=='y' && 'y'+1=='z'
int convertLowercaseCharLettertoInt(char letter) {
return letter - 'a';
}
#else
int convertLowercaseCharLettertoInt(char letter) {
static const char lowercase[] = "abcdefghijklmnopqrstuvwxyz";
const char *occurrence = strchr(lowercase, letter);
assert(letter && occurrence);
return occurrence - lowercase;
}
#endif
See also @John Bode code
Note: The following works in with all C encodings
int convertLowercaseOrUppercaseCharLettertoInt(char letter) {
char s[2] = { letter, '\0' };
return strtol(s, 0, 36) - 10;
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With