Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What character encoding does ObjectOutputStream 's writeObject method use?

I read that Java uses UTF-16 encoding internally. i.e. I understand that if I have like: String var = "जनमत"; then the "जनमत" will be encoded in UTF-16 internally. So, If I dump this variable to some file such as below:

fileOut = new FileOutputStream("output.xyz");
out = new ObjectOutputStream(fileOut);
out.writeObject(var);

will the encoding of the string "जनमत" in the file "output.xyz" be in UTF-16? Also, later on if I want to read from the file "output.xyz" via ObjectInputStream, will I be able to get the UTF-16 representation of the variable?

Thanks.

like image 236
Bikash Gyawali Avatar asked Dec 08 '10 17:12

Bikash Gyawali


People also ask

Which type of data can an ObjectOutputStream write?

An ObjectOutputStream writes primitive data types and graphs of Java objects to an OutputStream. The objects can be read (reconstituted) using an ObjectInputStream. Persistent storage of objects can be accomplished by using a file for the stream.

What is the return type of Readobject () method?

Return Value This method returns the object read from the stream.

What is writeObject in Java?

writeObject(Object obj) method writes the specified object to the ObjectOutputStream. The class of the object, the signature of the class, and the values of the non-transient and non-static fields of the class and all of its supertypes are written.

Is ObjectOutputStream serialized?

With the FileOutputStream created, we simply pass the instance to the constructor of the ObjectOutputStream. The combination of the FileOutputStream and the ObjectOutputStream will allow Java object serialization to happen at the file level.


2 Answers

So, If I dump this variable to some file... will the encoding of the string "जनमत" in the file "output.xyz" be in UTF-16?

The encoding of your string in the file will be in whatever format the ObjectOutputStream wants to put it in. You should treat it as a black box that can only be read by an ObjectInputStream. (Seriously - even though the format is IIRC well-documented, if you want to read it with some other tool, you should serialise the object yourself as XML or JSON or whatever.)

Later on if I want to read from the file "output.xyz" via ObjectInputStream, will I be able to get the UTF-16 representation of the variable?

If you read the file with an ObjectInputStream, you'll get a copy of the original object back. This will include a java.lang.String, which is a just stream of characters (not bytes) - from which you could get the UTF-16 representation if you wished via the getBytes() method (though I suspect you don't actually need to).


In conclusion, don't worry too much about the internal details of serialization. If you need to know what's going on, create the file yourself; and if you're just curious, trust in the JVM to do the right thing.

like image 124
Andrzej Doyle Avatar answered Oct 12 '22 11:10

Andrzej Doyle


Close: it is not exactly UTF-16, but something like UCS-2; but either way it does use 2 bytes for most characters (and sequence of 2 chars, i.e. 4 bytes for some rarely used code points).

ObjectOutputStream uses something called modified UTF-8, which is like UTF-8 but where zero character is expressed as 2-byte sequence which is not legal as per UTF-8 (due to uniqueness restrictions of encoding), but that sort of naturally decodes back to value 0.

But what you are really asking is "does it work so that I write a String, read a String" -- and answer to that is yes. JDK does proper encoding when writing bytes out, and decoding when reading.

For what it's worth, you are better of using "writeUTF()" method for Strings, since I think resulting output is bit more compact. but "writeObject()" also works, just needs bit more metadata.

like image 43
StaxMan Avatar answered Oct 12 '22 10:10

StaxMan