A very easy (and kind of elegant) way how to convert a lower-case letter-containing <code>char</code> into an <code>int</code> is to do the following: <pre class="prettyprint"><code>int convertLowercaseCharLettertoInt(char letter) { return letter - 'a'; } </code></pre> However, this code assumes that the <code>char</code> encoding follows the same ordering as the alphabet. Or, more generally, it assumes that <code>char</code> follows the ASCII encoding. <ul> <li>I know that Java <code>char</code> is UTF-16 while C <code>char</code> is ASCII. Although UTF-16 is not backward-compatible with ASCII, the ordering of the first 128 letters is the same in both. So is the ordering of the first 128 <code>char</code>s the same in all major languages such as C, C++, Java, C#, JavaScript and Python? </li> <li> Is the method above a safe thing to do in general (assuming the input is sanitized, etc.)? Or is it better to use hash-map or long <code>switch</code> statement approaches? The hash-map approach is, I think, the most elegant way how to solve this problem in the case of non-English alphabets. E.g. the Czech alphabet goes: a, á, b, c, č, d, ď, e, é, ě, f, g, h, ch, i, í, j, k, l, m, n, ň, o, ó, p, q, r, ř, s, &scaron;, t, ť, u, ú, ů, v, w, x, y, ý, z, ž.</li> </ul>

This has less to do with programming language, but more about the system's underlying character set. ASCII and all variants of Unicode will behave as you expect. 'a'...'z' are 26 consecutive code points. EBCDIC will not, so your trick will fail on an IBM/360 in most languages. Java (and Python, and perhaps other) languages mandate Unicode encoding regardless of the underlying platform, so your trick will work there as well, assuming you can find a conforming Java implementation for your IBM mainframe.

Is the char encoding same across programming languages?

Tags:

java

c++

python

c

char

A very easy (and kind of elegant) way how to convert a lower-case letter-containing char into an int is to do the following:

int convertLowercaseCharLettertoInt(char letter) {
    return letter - 'a';
}

However, this code assumes that the char encoding follows the same ordering as the alphabet. Or, more generally, it assumes that char follows the ASCII encoding.

I know that Java char is UTF-16 while C char is ASCII. Although UTF-16 is not backward-compatible with ASCII, the ordering of the first 128 letters is the same in both. So is the ordering of the first 128 chars the same in all major languages such as C, C++, Java, C#, JavaScript and Python?
Is the method above a safe thing to do in general (assuming the input is sanitized, etc.)? Or is it better to use hash-map or long switch statement approaches? The hash-map approach is, I think, the most elegant way how to solve this problem in the case of non-English alphabets. E.g. the Czech alphabet goes: a, á, b, c, č, d, ď, e, é, ě, f, g, h, ch, i, í, j, k, l, m, n, ň, o, ó, p, q, r, ř, s, š, t, ť, u, ú, ů, v, w, x, y, ý, z, ž.

323

asked Aug 28 '15 14:08

Augustin

2 Answers

This has less to do with programming language, but more about the system's underlying character set. ASCII and all variants of Unicode will behave as you expect. 'a'...'z' are 26 consecutive code points. EBCDIC will not, so your trick will fail on an IBM/360 in most languages.

Java (and Python, and perhaps other) languages mandate Unicode encoding regardless of the underlying platform, so your trick will work there as well, assuming you can find a conforming Java implementation for your IBM mainframe.

187

answered Sep 29 '22 05:09

Lee Daniel Crocker

In C, the compiler could detect problems

#if 'a'+1=='b' && 'b'+1=='c' && 'c'+1=='d' && 'd'+1=='e' && 'e'+1=='f' \
  && 'f'+1=='g' && 'g'+1=='h' && 'h'+1=='i' && 'i'+1=='j' && 'j'+1=='k'\
  && 'k'+1=='l' && 'l'+1=='m' && 'm'+1=='n' && 'n'+1=='o' && 'o'+1=='p'\
  && 'p'+1=='q' && 'q'+1=='r' && 'r'+1=='s' && 's'+1=='t' && 't'+1=='u'\
  && 'u'+1=='v' && 'v'+1=='w' && 'w'+1=='x' && 'x'+1=='y' && 'y'+1=='z'

int convertLowercaseCharLettertoInt(char letter) {
  return letter - 'a';
}
#else
  int convertLowercaseCharLettertoInt(char letter) {
    static const char lowercase[] = "abcdefghijklmnopqrstuvwxyz";
    const char *occurrence = strchr(lowercase, letter);
    assert(letter && occurrence);
    return occurrence - lowercase;
  }
#endif

chux - Reinstate Monica

Related questions
                            
                                Is there a way to suppress the "The type (...) collides with a package" warning in Eclipse?
                            
                                How to initialize all threads of a fixed thread pool before submitting any tasks? (JAVA)
                            
                                Associativity array in Java
                            
                                JavaFX: How to change row color?
                            
                                Cannot create AsynchronousJiraRestClientFactory (Dependency Issue)
                            
                                how to detect the secondary finger on touching the screen not at same time in android touchlistener?
                            
                                How to get list of strings from json object
                            
                                parsing date from day of week with java datetimeformatter
                            
                                Java Multithreading priority: Why in this example, sometimes t1 occurs before t2 is completed, even if t2 has higher priority?
                            
                                Using method references with parameters
                            
                                GridBagLayout gridwidth doesn't work as expected
                            
                                Concurrent array access by executor service
                            
                                Using pdfbox in java to overlay text onto previously created pdf document
                            
                                Spark job running out of heap memory on takeSample
                            
                                Where exactly should I put ehcache.xml in my project?
                            
                                Where it is useful to have nested classes in an interface? [duplicate]
                            
                                List all Classes WITHOUT Javadocs
                            
                                EJB Pooling vs Thread-safe and @PreDestroy
                            
                                Does Apache Shiro support bCrypt?
                            
                                How to convert android.net.Uri to java.net.URL? [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With