Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading strange unicode character in Java?

I have the following text file:

enter image description here

The file was saved with utf-8 encoding.

I used the following code to read the content of the file:

FileReader fr = new FileReader("f.txt");
BufferedReader br = new BufferedReader(fr);
String s1 = br.readLine();
String s2 = br.readLine();
System.out.println("s1 = " + s1.length());
System.out.println("s2 = " + s2.length());

the output:

s1 = 5

s2 = 4

Then I tried to use s1.charAt(0); to get the first character of s1 and it was '' (blank) character. That's why s1 has the length of 5. Even if I tried to use s1.trim(); its length still 5. I dont know why that happened? It worked correctly if the file was saved with ASCII encoding.

like image 890
ipkiss Avatar asked Jan 18 '23 02:01

ipkiss


1 Answers

Notepad apparently saved the file with a byte order mark, a nonprintable character at the beginning that just marks it as UTF-8 but is not required (and indeed not recommended) to use. You can ignore or remove it; other text editors often give you the choice of using UTF-8 with or without a BOM.

like image 112
Michael Borgwardt Avatar answered Jan 31 '23 02:01

Michael Borgwardt