Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Accentuated literals in Java

I tried to type char literals for accentuated vowels in Java, but the compilers says something like: unclosed character literal

This is what I'm trying to do:

 char [] a = {'à', 'á', 'â', 'ä' };

I've tried using Unicode '\u00E0' but for some reason they don't match with my code:

 for( char c : string.toCharArray() ) {
     if( c == a[i] ) {
         // I've found a funny letter 
     }
 }

The if never evaluates to true, no matter what I put in my string.

Here's the complete program I'm trying to code.

like image 500
OscarRyz Avatar asked Feb 07 '26 06:02

OscarRyz


2 Answers

The code should be compiled with the correct encoding:

javac -encoding UTF-8 Foo.java

There'll be an encoding mismatch there somewhere.

public class Foo {
  char [] a = {'à', 'á', 'â', 'ä' };  
}

The above code saved as UTF-8 should become the hex dump:

70 75 62 6C 69 63 20 63 6C 61 73 73 20 46 6F 6F         public class Foo
20 7B 0D 0A 20 20 63 68 61 72 20 5B 5D 20 61 20          {__  char [] a
3D 20 7B 27 C3 A0 27 2C 20 27 C3 A1 27 2C 20 27         = {'__', '__', '
C3 A2 27 2C 20 27 C3 A4 27 20 7D 3B 20 20 0D 0A         __', '__' };  __
7D 0D 0A 0D 0A                                          }____

The UTF-8 value for code point U+00E0 (à) is C3 A0.

The code should be compiled with the correct encoding:

javac -encoding UTF-8 Foo.java

There is an outside chance that à will be represented by the combining sequence U+0061 U+0300. This is the NFD form (I've never come across a text editor that used it as a default for text entry). As Thorbjørn Ravn Andersen points out, it is often better to always use \uXXXX escape sequences - it is less ambiguous.

You also need to check your input device (file/console/etc.)

As a last resort, you can dump your chars as hex System.out.format("%04x", (int) c); and try manually decoding them with a character inspector to find out what they are.

like image 52
McDowell Avatar answered Feb 08 '26 19:02

McDowell


For Unicode chacters to work, you must be certain that javac reads it in the same encoding as it is written.

You will save yourself a lot of trouble by just using the \uXXXX notation.

like image 20
Thorbjørn Ravn Andersen Avatar answered Feb 08 '26 20:02

Thorbjørn Ravn Andersen



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!