What's the difference between a string in the source code and a string read from a file?

Tags:

there is a file named "dd.txt" in my disk, it's content is \u5730\u7406

now ,when i run this program

public static void main(String[] args) throws IOException {
    FileInputStream fis=new FileInputStream("d:\\dd.txt");
    ByteArrayOutputStream baos=new ByteArrayOutputStream();
    byte[] buffer=new byte[fis.available()];
    while ((fis.read(buffer))!=-1) {
        baos.write(buffer);
    }
    String s1="\u5730\u7406";
    String s2=baos.toString("utf-8");
    System.out.println("s1:"+s1+"\n"+"s2:"+s2);
}

and i got different result

s1:地理
s2:\u5730\u7406

can you tell me why? and how i can read that file and get the same result like s1 in chinese?

369

asked Jul 14 '15 07:07

Paul Wang

2 Answers

When you write \u5730 in Java code, it's interpreted as a single unicode character (a unicode literal) by the compiler. When you write the same to a file, it's just 6 regular characters (because there's nothing interpreting it). Is there a reason why you're not writing 地理 directly to the file?

If you wish to read the file containing the unicode literals, you'll need to parse the values yourself, throwing away the \u and parsing the unicode codepoint yourself. It's a lot easier to just write proper unicode with a suitable encoding (e.g. UTF-8) in the file in the first place if you control the creation of the file, and under normal circumstances you should never come across files containing these escaped unicode literals.

118

answered Sep 28 '22 16:09

Kayaman

In your Java code, the \uxxxx are interpreted as Unicode literals, so they are shown as Chinese characters. This is only done so because the compiler is instructed to do so.

To obtain the same result, you have to do some parsing yourself:

String[] hexCodes = s2.split("\\\\u");
for (String hexCode : hexCodes) {
    if (hexCode.length() == 0)
        continue;
    int intValue = Integer.parseInt(hexCode, 16);
    System.out.print((char)intValue);
}

(note that this only works if every character is in Unicode literal form, e.g. \uxxxx)

answered Sep 28 '22 18:09

Glorfindel

Related questions
                            
                                Setting encoding for a Multipart Entity
                            
                                Format of TYPE_INT_RGB and TYPE_INT_ARGB
                            
                                Apache Tika and character limit when parsing documents
                            
                                Guava - How to remove from a list, based on a predicate, keeping track of what was removed?
                            
                                IPV6 address into compressed form in Java
                            
                                Rename JSON fields used by MappingJacksonJsonView in Spring
                            
                                Change the Android bluetooth device name
                            
                                What is the equivalent of Java's System.out.println() in Javascript?
                            
                                What is parametric polymorphism in Java (with example)?
                            
                                Android, can I put AsyncTask in a separate class and have a callback?
                            
                                What is the difference between a source folder and a (normal) folder
                            
                                implementing a lazy Supplier in java
                            
                                JOptionPane without button
                            
                                How long ago was the last known location recorded?
                            
                                Adding new paths for native libraries at runtime in Java
                            
                                JAXB unmarshall with namespaces and prefix
                            
                                frame rate vs sample rate
                            
                                JavaFX bind to multiple properties
                            
                                Drawing a gradient in Libgdx
                            
                                What is <form:select path> in spring tag used for?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What's the difference between a string in the source code and a string read from a file?

Tags:

java

string

file-io

utf-8

Paul Wang

People also ask

2 Answers

Kayaman

Glorfindel

Recent Activity

Donate For Us