How does Java determine the encoding used for <code>System.out</code>? Given the following class: <pre class="prettyprint"><code>import java.io.File; import java.io.PrintWriter; public class Foo { public static void main(String[] args) throws Exception { String s = "xxäñxx"; System.out.println(s); PrintWriter out = new PrintWriter(new File("test.txt"), "UTF-8"); out.println(s); out.close(); } } </code></pre> It is saved as UTF-8 and compiled with <code>javac -encoding UTF-8 Foo.java</code> on a Windows system. Afterwards on a git-bash console (using UTF-8 charset) I do: <pre class="prettyprint"><code>$ java Foo xxõ±xx $ java -Dfile.encoding=UTF-8 Foo xx├ñ├▒xx $ cat test.txt xxäñxx $ java Foo | cat xxäñxx $ java -Dfile.encoding=UTF-8 Foo | cat xxäñxx </code></pre> What is going on here? Obviously java checks if it is connected to a terminal and is changing its encoding in that case. Is there a way to force Java to simply output plain UTF-8? <hr> I tried the same with the cmd console, too. Redirecting STDOUT does not seem to make any difference there. Without the file.encoding parameter it outputs ansi encoding with the parameter it outputs utf8 encoding.

I'm assuming that your console still runs under cmd.exe. I doubt your console is really expecting UTF-8 - I expect it is really an OEM DOS encoding (e.g. 850 or 437.) Java will encode bytes using the default encoding set during JVM initialization. Reproducing on my PC: <pre class="prettyprint"><code>java Foo </code></pre> Java encodes as windows-1252; console decodes as IBM850. Result: Mojibake <pre class="prettyprint"><code>java -Dfile.encoding=UTF-8 Foo </code></pre> Java encodes as UTF-8; console decodes as IBM850. Result: Mojibake <pre class="prettyprint"><code>cat test.txt </code></pre> cat decodes file as UTF-8; cat encodes as IBM850; console decodes as IBM850. <pre class="prettyprint"><code>java Foo | cat </code></pre> Java encodes as windows-1252; cat decodes as windows-1252; cat encodes as IBM850; console decodes as IBM850 <pre class="prettyprint"><code>java -Dfile.encoding=UTF-8 Foo | cat </code></pre> Java encodes as UTF-8; cat decodes as UTF-8; cat encodes as IBM850; console decodes as IBM850 This implementation of cat must use heuristics to determine if the character data is UTF-8 or not, then transcodes the data from either UTF-8 or ANSI (e.g. windows-1252) to the console encoding (e.g. IBM850.) This can be confirmed with the following commands: <pre class="prettyprint"><code>$ java HexDump utf8.txt 78 78 c3 a4 c3 b1 78 78 $ cat utf8.txt xxäñxx $ java HexDump ansi.txt 78 78 e4 f1 78 78 $ cat ansi.txt xxäñxx </code></pre> The cat command can make this determination because <code>e4 f1</code> is not a valid UTF-8 sequence. You can correct the Java output by: <ul> <li> Setting the console encoding to the system ANSI value</li> <li>Using the Console type</li> <li>Using some shiv layer as you are doing with cat </li> </ul> HexDump is a trivial Java application: <pre class="prettyprint"><code>import java.io.*; class HexDump { public static void main(String[] args) throws IOException { try (InputStream in = new FileInputStream(args[0])) { int r; while((r = in.read()) != -1) { System.out.format("%02x ", 0xFF & r); } System.out.println(); } } } </code></pre>

Default character encoding for java console output

Tags:

java

windows

character-encoding

utf-8

console

How does Java determine the encoding used for System.out?

Given the following class:

import java.io.File;
import java.io.PrintWriter;

public class Foo
{
    public static void main(String[] args) throws Exception
    {
        String s = "xxäñxx";
        System.out.println(s);
        PrintWriter out = new PrintWriter(new File("test.txt"), "UTF-8");
        out.println(s);
        out.close();
    }
}

It is saved as UTF-8 and compiled with javac -encoding UTF-8 Foo.java on a Windows system.

Afterwards on a git-bash console (using UTF-8 charset) I do:

$ java Foo
xxõ±xx
$ java -Dfile.encoding=UTF-8 Foo
xx├ñ├▒xx
$ cat test.txt
xxäñxx
$ java Foo | cat
xxäñxx
$ java -Dfile.encoding=UTF-8 Foo | cat
xxäñxx

What is going on here?

Obviously java checks if it is connected to a terminal and is changing its encoding in that case. Is there a way to force Java to simply output plain UTF-8?

I tried the same with the cmd console, too. Redirecting STDOUT does not seem to make any difference there. Without the file.encoding parameter it outputs ansi encoding with the parameter it outputs utf8 encoding.

643

asked Jul 17 '14 12:07

michas

1 Answers

I'm assuming that your console still runs under cmd.exe. I doubt your console is really expecting UTF-8 - I expect it is really an OEM DOS encoding (e.g. 850 or 437.)

Java will encode bytes using the default encoding set during JVM initialization.

Reproducing on my PC:

java Foo

Java encodes as windows-1252; console decodes as IBM850. Result: Mojibake

java -Dfile.encoding=UTF-8 Foo

Java encodes as UTF-8; console decodes as IBM850. Result: Mojibake

cat test.txt

cat decodes file as UTF-8; cat encodes as IBM850; console decodes as IBM850.

java Foo | cat

Java encodes as windows-1252; cat decodes as windows-1252; cat encodes as IBM850; console decodes as IBM850

java -Dfile.encoding=UTF-8 Foo | cat

Java encodes as UTF-8; cat decodes as UTF-8; cat encodes as IBM850; console decodes as IBM850

This implementation of cat must use heuristics to determine if the character data is UTF-8 or not, then transcodes the data from either UTF-8 or ANSI (e.g. windows-1252) to the console encoding (e.g. IBM850.)

This can be confirmed with the following commands:

$ java HexDump utf8.txt
78 78 c3 a4 c3 b1 78 78

$ cat utf8.txt
xxäñxx

$ java HexDump ansi.txt
78 78 e4 f1 78 78

$ cat ansi.txt
xxäñxx

The cat command can make this determination because e4 f1 is not a valid UTF-8 sequence.

You can correct the Java output by:

Setting the console encoding to the system ANSI value
Using the Console type
Using some shiv layer as you are doing with cat

HexDump is a trivial Java application:

import java.io.*;
class HexDump {
  public static void main(String[] args) throws IOException {
    try (InputStream in = new FileInputStream(args[0])) {
      int r;
      while((r = in.read()) != -1) {
        System.out.format("%02x ", 0xFF & r);
      }
      System.out.println();
    }
  }
}

answered Sep 20 '22 03:09

McDowell

Related questions
                            
                                Setting up JMonkeyEngine in Intellij IDEA
                            
                                Java Logger entering() and exiting() methods
                            
                                Error: Make sure the Cursor is initialized correctly before accessing data from it?
                            
                                New java.security.AccessControlException in Java 8
                            
                                Sonar 4.2 analysis both Java and JavaScript in same project
                            
                                Reading JPG file's XMP metadata
                            
                                Formatting date & time according to user's locale & preferences with seconds
                            
                                Save Excel document Apache POI
                            
                                Asserting the presence of scrollbar using Selenium (webdriver java cucumber)
                            
                                Spring MVC web application: No default constructor found
                            
                                Java: Extracting zip file with multiple subdirectories [duplicate]
                            
                                java.util.ConcurrentModificationException - ArrayList
                            
                                How should I sum something with streams?
                            
                                Time complexity of Arrays.equals()
                            
                                missing behavior definition for the preceding method call:Usage is: expect(a.foo()).andXXX()
                            
                                java - open url in chrome browser only
                            
                                Spring DATA JPA example for multiple foreign keys in a single entity
                            
                                Conversion from string to generic type
                            
                                hibernate. How to add unique index for field combination?
                            
                                assets are not loaded in functional test mode

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With