I have the following text file:
The file was saved with utf-8 encoding.
I used the following code to read the content of the file:
FileReader fr = new FileReader("f.txt");
BufferedReader br = new BufferedReader(fr);
String s1 = br.readLine();
String s2 = br.readLine();
System.out.println("s1 = " + s1.length());
System.out.println("s2 = " + s2.length());
the output:
s1 = 5
s2 = 4
Then I tried to use s1.charAt(0);
to get the first character of s1 and it was ''
(blank) character. That's why s1 has the length of 5. Even if I tried to use s1.trim();
its length still 5.
I dont know why that happened? It worked correctly if the file was saved with ASCII encoding.
Notepad apparently saved the file with a byte order mark, a nonprintable character at the beginning that just marks it as UTF-8 but is not required (and indeed not recommended) to use. You can ignore or remove it; other text editors often give you the choice of using UTF-8 with or without a BOM.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With