I am struggling to get Eclipse to read in Chinese characters correctly, and I am not sure where I may be going wrong.
Specifically, somewhere between reading in a string of Chinese (simplified or traditional) from the console and outputting it, it gets garbled. Even when outputting a large string of mixed text (English/Chinese characters), it appears to only alter the appearance of the Chinese characters.
I have cut it down to the following test example and explicitly annotated it with what I believe is happening at each stage - note that I am a student and would very much like to confirm my understanding (or otherwise) :)
public static void main(String[] args) {
try
{
boolean isRunning = true;
//Raw flow of input data from the console
InputStream inputStream = System.in;
//Allows you to read the stream, using either the default character encoding, else the specified encoding;
InputStreamReader inputStreamReader = new InputStreamReader(inputStream, "UTF-8");
//Adds functionality for converting the stream being read in, into Strings(?)
BufferedReader input_BufferedReader = new BufferedReader(inputStreamReader);
//Raw flow of outputdata to the console
OutputStream outputStream = System.out;
//Write a stream, from a given bit of text
OutputStreamWriter outputStreamWriter = new OutputStreamWriter(outputStream, "UTF-8");
//Adds functionality to the base ability to write to a stream
BufferedWriter output_BufferedWriter = new BufferedWriter(outputStreamWriter);
while(isRunning) {
System.out.println();//force extra newline
System.out.print("> ");
//To read in a line of text (as a String):
String userInput_asString = input_BufferedReader.readLine();
//To output a line of text:
String outputToUser_fromString_englishFromCode = "foo"; //outputs correctly
output_BufferedWriter.write(outputToUser_fromString_englishFromCode);
output_BufferedWriter.flush();
System.out.println();//force extra newline
String outputToUser_fromString_ChineseFromCode = "之謂甚"; //outputs correctly
output_BufferedWriter.write(outputToUser_fromString_ChineseFromCode);
output_BufferedWriter.flush();
System.out.println();//force extra newline
String outputToUser_fromString_userSupplied = userInput_asString; //outputs correctly when given English text, garbled when given Chinese text
output_BufferedWriter.write(outputToUser_fromString_userSupplied);
output_BufferedWriter.flush();
System.out.println();//force extra newline
}
}
catch (Exception e) {
// TODO: handle exception
}
}
Sample output:
> 之謂甚
foo
之謂甚
之謂甚
> oaea
foo
之謂甚
oaea
> mixed input - English: fubar; Chinese: 之謂甚;
foo
之謂甚
mixed input - English: fubar; Chinese: 之謂甚;
>
What is seen on this Stack Overflow post matches exactly what I see in the Eclipse console and what is seen within the Eclipse debugger (when viewing/editing the variable values). Altering the variable values manually via the Eclipse debugger results in the code depending on that value to behave as I would normally expect them to, suggesting that it is how the text is read IN that is an issue.
I have tried many different combinations of scanners/buffered stream [reader|writer]s etc to read in and output, with and without explicit character types though this wasn't done particularly systematically and could easily have missed something.
I have tried to set the Eclipse environment to use UTF-8 wherever possible, but I guess I could have missed a place or two.. Note that the console will correctly output hard-coded Chinese characters.
Any assistance / guidance on this matter is greatly appreciated :)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With