I have the following code
public class MainDefault {
public static void main (String[] args) {
System.out.println("²³");
System.out.println(Arrays.toString("²³".getBytes()));
}
}
But can't seem to print the special characters to the console
When I do the following, I get the following result
$ javac MainDefault.java
$ java MainDefault
On the other hand, when I compile it and run it like this
$ javac -encoding UTF8 MainDefault.java
$ java MainDefault
And when I run it using the file encoding UTF8 flag, I get the following
$ java -Dfile.encoding=UTF8 MainDefault
It's doesn't seem to be a problem with the console (Git Bash on Windows 10), as it prints the characters normally
Thanks for your help
0xC0, 0xC1, 0xF5, 0xF6, 0xF7, 0xF8, 0xF9, 0xFA, 0xFB, 0xFC, 0xFD, 0xFE, 0xFF are invalid UTF-8 code units. A UTF-8 code unit is 8 bits. If by char you mean an 8-bit byte, then the invalid UTF-8 code units would be char values that do not appear in UTF-8 encoded text.
Each UTF can represent any Unicode character that you need to represent. UTF-8 is based on 8-bit code units. Each character is encoded as 1 to 4 bytes. The first 128 Unicode code points are encoded as 1 byte in UTF-8.
Your code are not printing the right characters in the console because your Java program and the console are using different character sets, different encodings.
If you want to obtain the same characters, you first need to determine which character sets are in place.
This process will depend on the "console" in which you are outputting your results.
If you are working with Windows and cmd
, as @RickJames suggested, you can use the chcp
command to determine the active code page.
Oracle provides the Java full supported encodings information, and the correspondence with other alias - code pages in this case - in this page.
This stackoverflow answer also provides some guidance about the mapping between Windows Code Pages and Java charsets.
As you can see in the provided links, the code page for UTF-8
is 65001
.
If you are using Git Bash (MinTTY), you can follow @kriegaex instructions to verify or configure UTF-8
as the terminal emulator encoding.
Linux and UNIX, or UNIX derived systems like Mac OS, do not use code page identifiers, but locales. The locale information can vary between systems, but you can either use the locale
command or try to inspect the LC_*
system variables to find the required information.
This is the output of the locale
command in my system:
LANG="es_ES.UTF-8"
LC_COLLATE="es_ES.UTF-8"
LC_CTYPE="es_ES.UTF-8"
LC_MESSAGES="es_ES.UTF-8"
LC_MONETARY="es_ES.UTF-8"
LC_NUMERIC="es_ES.UTF-8"
LC_TIME="es_ES.UTF-8"
LC_ALL=
Once you know this information, you need to run your Java program with the file.encoding
VM option corresponding to the right charset:
java -Dfile.encoding=UTF8 MainDefault
Some classes, like PrintStream
or PrintWriter
, allows you to indicate the Charset
in which the information will be outputted.
The -encoding
javac
option only allows you to specify the character encoding used by source files.
If you are using Windows with Git Bash, consider also reading this @rmunge answer: it provides information about a possible bug in the tool that may be the reason for the problem and that prevents the terminal from running correctly out of the box without the need for manual encoding adjustments.
I am also using the Git Bash on Windows 10 and It works totally fine for me.
Here's how it prints,
Terminal version is mintty 3.0.2 (x86_64-pc-msys)
and My text properties were,
So, I tried to reproduce your outputs by changing Character Sets;
By setting Character Set to CP437 (OEM codepage)
(Note that this automatically changed Locale to C
too), I could be able to get the output as you got.
And then after when I change it back to UTF-8 (Unicode)
, the I could get the output as expected!
Therefore, it is clear that the problem is with your console's Character Set.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With