Printing unicode to console

Tags:

I'm trying to create a custom print stream that can print localized messages to the console. I encountered a problem doing this on Windows. Here is what I'm attempting to do

I have a unicode string
Convert unicode string to bytes using UTF-8 encoding
Convert bytes to a new string with console encoding
Print new string to console with console encoding

In this code, I tried to do the above steps but it fails miserably. Strangely the default System.out.println call works correctly. However, I want to use a custom print stream and not rely on the default System.out.

Can someone explain how I can print unicode to the console using my custom print stream? And why is the default System.out already equipped to print things correctly?

Here is my code - I compiled it and ran it from the command line. I set my system locale to zh-CN beforehand.

public static void main(String[] args) throws Exception{
    Charset defaultCharset = Charset.defaultCharset();
    System.out.println(defaultCharset);
    // charset is windows-1252

    String unicodeMessage =
            "\u4e16\u754c\u4f60\u597d\uff01";

    System.out.println(unicodeMessage);
    // string is printed correctly using System.out (世界你好！)


    byte[] sourceBytes = unicodeMessage.getBytes("UTF-8");
    String data = new String(sourceBytes , defaultCharset.name());

    PrintStream out = new PrintStream(System.out, true, defaultCharset.name());
    out.println(data);
    // prints gibberish: ??–????????????
}

934

asked Dec 18 '15 17:12

HAL

2 Answers

windows-1252 charset is the problem here. We need to use UTF-8 charset to print. Following worked for me:

public static void main(String[] args) throws Exception{
    Charset utf8Charset = Charset.forName("UTF-8");
    Charset defaultCharset = Charset.defaultCharset();
    System.out.println(defaultCharset);
    // charset is windows-1252

    String unicodeMessage = "\u4e16\u754c\u4f60\u597d\uff01";

    System.out.println(unicodeMessage);
    // string is printed correctly using System.out (世界你好！)


    byte[] sourceBytes = unicodeMessage.getBytes("UTF-8");
    String data = new String(sourceBytes , defaultCharset.name());

    PrintStream out = new PrintStream(System.out, true, utf8Charset.name());
    out.println(data);
}

173

answered Oct 21 '22 06:10

Darshan Mehta

You have a number of issues and misunderstandings. Firstly,

byte[] sourceBytes = unicodeMessage.getBytes("UTF-8");
String data = new String(sourceBytes , defaultCharset.name());

data is now full of mojibake - you've decoded UTF-8 as windows-1252. You then print this string to through a UTF-8 encoder. System.out then encodes for your console's codepage. It's got three levels of broken.

Now, the reason System.out.println(unicodeMessage); works is because you set your locale correctly. Java uses this (the codepage of the console), not defaultCharset to setup the console.

The problem, you'll face is the Window console doesn't support UTF-8. You'll be ok printing characters from your codepage but not others. Find another solution, such as writing to a file or sending the results to a web page.

answered Oct 21 '22 06:10

Alastair McCormack

Related questions
                            
                                Spring Data Rest 2.4 Intermittent Error (ClassCastException)
                            
                                Spring Boot devtools IntelliJ
                            
                                Is unsubscribe thread safe in RxJava?
                            
                                Remove items from list or add build a new list? [closed]
                            
                                Java LongStream to sum int array elements
                            
                                Create a shell script to run a Java program on Linux
                            
                                How to get lines before and after matching from java 8 stream like grep?
                            
                                Lambda function in Java skips elements
                            
                                I need my Spring Boot WebApplication to restart in JUnit
                            
                                InterceptorBinding is not working
                            
                                Immutability vs state change in a class
                            
                                What is the practical use of phantom references - JAVA?
                            
                                How do I create a Spring Boot Starter Project in Eclipse that is properly configured with a Run Configuration?
                            
                                Does JDK7 NIO.2 use Epoll etc. on Linux?
                            
                                LMAX Disruptor - what determines the batch size?
                            
                                How can I immediately terminate a Thread? (Not interrupt)
                            
                                Casting Java ArrayList<Custom Class> to ArrayList<Object>
                            
                                Canonical form of field
                            
                                Why not lock on a value-based class
                            
                                Bench Mark in Multi threaded environment

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Printing unicode to console

Tags:

java

encoding

unicode

HAL

People also ask

2 Answers

Darshan Mehta

Alastair McCormack

Recent Activity

Donate For Us