Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

java console charset translation

Tags:

java

Console input (win), how is the charset convertion working?

Code below, non-ascii chars output garbage - InputStreamReader in the below example doesn't take charset as an argument.

BufferedReader console = new BufferedReader( new InputStreamReader(System.in));
String inp = console.readLine();
System.out.println(inp.toUpperCase());

Being os-independent, how does Java solve all different possible charset configurations regarding console prompt input?

like image 862
Teson Avatar asked Dec 23 '11 13:12

Teson


People also ask

How do you define a charset in Java?

The native character encoding of the Java programming language is UTF-16. A charset in the Java platform therefore defines a mapping between sequences of sixteen-bit UTF-16 code units (that is, sequences of chars) and sequences of bytes.

What is encoding Java?

In Java, when we deal with String sometimes it is required to encode a string in a specific character set. Encoding is a way to convert data from one format to another. String objects use UTF-16 encoding.


2 Answers

Actually, Java doesn't handle this problem at all.

It simply assumes that console encoding is the same as the system default encoding. This assumption is wrong on Windows systems, therefore Java doesn't provide good solution to perform correct console IO with respect to non-ascii characters on Windows.

Possible solutions are:

  • Use System.console() introduced in Java 6:

    BufferedReader in = new BufferedReader(System.console().reader());
    BufferedWriter out = new PrintWriter(System.console().writer(), true);
    
    out.println(in.readLine().toUpperCase());
    

    Note that System.console() can return null when you program run with redirected IO, for example, in IDE. You need a fallback for this case.

  • Specify console encoding explicitly:

    String consoleEncoding = "...";
    BufferedReader in = new BufferedReader(new InputStreamReader(System.in, consoleEncoding));
    BufferedWriter out = new PrintWriter(new OutputStreamWriter(System.in, consoleEncoding), true);
    
    out.println(in.readLine().toUpperCase());
    

    As far as I know, there are no good ways to determine actual console encoding programmatically without native code.

  • Specify console encoding as default encoding using file.encoding property, so that the assumption that console IO uses default encoding would be correct:

    java -Dfile.encoding=... ...
    
like image 179
axtavt Avatar answered Oct 21 '22 04:10

axtavt


1) Practically speaking : how do Character Encodings work, and how you should deal with them :

Any character stream that is read in is Encoded/Decoded. Java bundles the encoding/decoding specifics as part of the JDK : http://docs.oracle.com/javase/1.6/docs/guide/intl/encoding.doc.html. Example : UTF-8 issue in Java code.

2) Your specific question : HOW does the cross-platform JAVA language handle console input which is OS-specific ?

The short answer : Although Java byte-code is platform neutral, the JVM is NOT. That is, the java "System" "in/out/err" streaming functionality is not implemented fully in regular old java !

When you RUN java, the "System" class, which abstracts the basic notion of a system that the JVM is running in, is loaded. In this time, it's input/output/error streams are (i.e. the objects you are accessing when you type System.in , System.out, System.err are set up at RUNTIME by the ClassLoader which is responsible for, well ... loading java classes.

In the case of "System", ClassLoading is a sophisticated task, as you imply, because setting up the System class (just like setting up the java Runtime class) is a lower level JVM implementation issue is OS specific.

Again, just to be clear : Although the Java LANGUAGE is platform-independent, the JVM for your platform is thus, unlike the Java programming language, an OS specific environment that creates the resources we reference in our code for us at runtime.

For more understanding : Checkout the actual source code for the System class, its very readable and will give you a better understanding whats going on. In particular, look at the nullInputStream() method :

http://www.java2s.com/Open-Source/Java-Document/6.0-JDK-Core/lang/java/lang/System.java.htm

like image 22
jayunit100 Avatar answered Oct 21 '22 03:10

jayunit100