Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading unicode character in java

Tags:

java

file

unicode

I'm a bit new to java, When I assign a unicode string to

  String str = "\u0142o\u017Cy\u0142";
  System.out.println(str);

  final StringBuilder stringBuilder = new StringBuilder();
  InputStream inStream = new FileInputStream("C:/a.txt");
  final InputStreamReader streamReader = new InputStreamReader(inStream, "UTF-8");
  final BufferedReader bufferedReader = new BufferedReader(streamReader);
  String line = "";
  while ((line = bufferedReader.readLine()) != null) {
      System.out.println(line);
      stringBuilder.append(line);
  }

Why are the results different in both cases the file a.txt also contains the same string. but when i print output of the file it prints z\u0142o\u017Cy\u0142 instead of the actual unicode characters. Any idea how do i do this if i want to file content also to be printed as string is being printed.

like image 992
Rakesh Avatar asked Apr 08 '26 07:04

Rakesh


2 Answers

Your code should be correct, but I guess that the file "a.txt" does not contain the Unicode characters encoded with UTF-8, but the escaped string "\u0142o\u017Cy\u0142".

Please check if the text file is correct, using an UTF-8 aware editor such as recent versions of Notepad or Notepad++ on Windows. Or edit it with your favorite hex editor - it should not contain backslashes.

I tried it with "€" as UTF-8-encoded content of the file and it gets printed correctly. Note that not all Unicode characters can be printed, depending on your terminal encoding (really a hassle on Windows) and font.

like image 190
AndiDog Avatar answered Apr 09 '26 20:04

AndiDog


Java interprets unicode escapes such as your \u0142 that are in the source code as if you had actually typed that character (latin small letter L with stroke) into the source. Java does not interpret unicode escapes that it reads from a file.

If you take your String str = "\u0142o\u017Cy\u0142"; and write it to a file a.txt from your Java program, then open the file in an editor, you'll see the characters themselves in the file, not the \uNNNN sequence.

If you then take your original posted program and read that a.txt file you should see what you expected.

like image 26
Stephen P Avatar answered Apr 09 '26 19:04

Stephen P



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!