there is a file named "dd.txt" in my disk, it's content is
\u5730\u7406
now ,when i run this program
public static void main(String[] args) throws IOException {
FileInputStream fis=new FileInputStream("d:\\dd.txt");
ByteArrayOutputStream baos=new ByteArrayOutputStream();
byte[] buffer=new byte[fis.available()];
while ((fis.read(buffer))!=-1) {
baos.write(buffer);
}
String s1="\u5730\u7406";
String s2=baos.toString("utf-8");
System.out.println("s1:"+s1+"\n"+"s2:"+s2);
}
and i got different result
s1:地理
s2:\u5730\u7406
can you tell me why? and how i can read that file and get the same result like s1 in chinese?
The readString() method of File Class in Java is used to read contents to the specified file. Return Value: This method returns the content of the file in String format. Note: File. readString() method was introduced in Java 11 and this method is used to read a file's content into String.
Below is the code snippet to read the file to String using BufferedReader. BufferedReader reader = new BufferedReader(new FileReader(fileName)); StringBuilder stringBuilder = new StringBuilder(); String line = null; String ls = System. getProperty("line. separator"); while ((line = reader.
When you write \u5730
in Java code, it's interpreted as a single unicode character (a unicode literal) by the compiler. When you write the same to a file, it's just 6 regular characters (because there's nothing interpreting it). Is there a reason why you're not writing 地理
directly to the file?
If you wish to read the file containing the unicode literals, you'll need to parse the values yourself, throwing away the \u
and parsing the unicode codepoint yourself. It's a lot easier to just write proper unicode with a suitable encoding (e.g. UTF-8) in the file in the first place if you control the creation of the file, and under normal circumstances you should never come across files containing these escaped unicode literals.
In your Java code, the \uxxxx
are interpreted as Unicode literals, so they are shown as Chinese characters. This is only done so because the compiler is instructed to do so.
To obtain the same result, you have to do some parsing yourself:
String[] hexCodes = s2.split("\\\\u");
for (String hexCode : hexCodes) {
if (hexCode.length() == 0)
continue;
int intValue = Integer.parseInt(hexCode, 16);
System.out.print((char)intValue);
}
(note that this only works if every character is in Unicode literal form, e.g. \uxxxx
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With