Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is a character 1 byte or 2 bytes in Java?

I thought characters in java are 16 bits as suggested in java doc. Isn't it the case for strings? I have a code that stores an object into a file:

public static void storeNormalObj(File outFile, Object obj) {
    FileOutputStream fos = null;
    ObjectOutputStream oos = null;
    try {
        fos = new FileOutputStream(outFile);
        oos = new ObjectOutputStream(fos);
        oos.writeObject(obj);
        oos.flush();
    } catch (IOException e) {
        e.printStackTrace();
    } finally {
        try {
            oos.close();
            try {
                fos.close();
            } catch (Exception e) {
                e.printStackTrace();
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

Basically, I tried to store an string "abcd" in to file "output", when I opened up output with an editor and deleted the none string part, what's left is just the string "abcd", which is 4 bytes in total. Anyone knows why? Does java automatically saves space by using ASCII instead of UNICODE for Strings that can be supported by ASCII? Thanks

like image 694
user685275 Avatar asked May 13 '11 06:05

user685275


People also ask

Is char 2 bytes in Java?

And, every char is made up of 2 bytes because Java internally uses UTF-16. For instance, if a String contains a word in the English language, the leading 8 bits will all be 0 for every char, as an ASCII character can be represented using a single byte.

Is a char 1 byte in Java?

Because java is unicode based and c is ASCII code based and java contains 18 languages whereas c contains only 256 character. 256 is represented in 1 byte but 65535 can't represent in 1 byte so java char size is 2 byte or c char size is 1 byte.

Is a char 1 or 2 bytes?

The char type takes 1 byte of memory (8 bits) and allows expressing in the binary notation 2^8=256 values. The char type can contain both positive and negative values. The range of values is from -128 to 127.


1 Answers

(I think by "none string part" you are referring to the bytes that ObjectOutputStream emits when you create it. It is possible you don't want to use ObjectOutputStream, but I don't know your requirements.)

Just FYI, Unicode and UTF-8 are not the same thing. Unicode is a standard that specifies, amongst other things, what characters are available. UTF-8 is a character encoding that specifies how these characters shall be physically encoded in 1s and 0s. UTF-8 can use 1 byte for ASCII (<= 127) and up to 4 bytes to represent other Unicode characters.

UTF-8 is a strict superset of ASCII. So even if you specify a UTF-8 encoding for a file and you write "abcd" to it, it will contain just those four bytes: they have the same physical encoding in ASCII as they do in UTF-8.

Your method uses ObjectOutputStream which actually has a significantly different encoding than either ASCII or UTF-8! If you read the Javadoc carefully, if obj is a string and has already occurred in the stream, subsequent calls to writeObject will cause a reference to the previous string to be emitted, potentially causing many fewer bytes to be written in the case of repeated strings.

If you're serious about understanding this, you really should spend a good amount of time reading about Unicode and character encoding systems. Wikipedia has an excellent article on Unicode as a start.

like image 185
sjr Avatar answered Sep 22 '22 04:09

sjr