I read that Java uses UTF-16 encoding internally. i.e. I understand that if I have like: String var = "जनमत"; then the "जनमत" will be encoded in UTF-16 internally. So, If I dump this variable to some file such as below:
fileOut = new FileOutputStream("output.xyz");
out = new ObjectOutputStream(fileOut);
out.writeObject(var);
will the encoding of the string "जनमत" in the file "output.xyz" be in UTF-16? Also, later on if I want to read from the file "output.xyz" via ObjectInputStream, will I be able to get the UTF-16 representation of the variable?
Thanks.
An ObjectOutputStream writes primitive data types and graphs of Java objects to an OutputStream. The objects can be read (reconstituted) using an ObjectInputStream. Persistent storage of objects can be accomplished by using a file for the stream.
Return Value This method returns the object read from the stream.
writeObject(Object obj) method writes the specified object to the ObjectOutputStream. The class of the object, the signature of the class, and the values of the non-transient and non-static fields of the class and all of its supertypes are written.
With the FileOutputStream created, we simply pass the instance to the constructor of the ObjectOutputStream. The combination of the FileOutputStream and the ObjectOutputStream will allow Java object serialization to happen at the file level.
So, If I dump this variable to some file... will the encoding of the string "जनमत" in the file "output.xyz" be in UTF-16?
The encoding of your string in the file will be in whatever format the ObjectOutputStream
wants to put it in. You should treat it as a black box that can only be read by an ObjectInputStream
. (Seriously - even though the format is IIRC well-documented, if you want to read it with some other tool, you should serialise the object yourself as XML or JSON or whatever.)
Later on if I want to read from the file "output.xyz" via ObjectInputStream, will I be able to get the UTF-16 representation of the variable?
If you read the file with an ObjectInputStream
, you'll get a copy of the original object back. This will include a java.lang.String
, which is a just stream of characters (not bytes) - from which you could get the UTF-16 representation if you wished via the getBytes() method (though I suspect you don't actually need to).
In conclusion, don't worry too much about the internal details of serialization. If you need to know what's going on, create the file yourself; and if you're just curious, trust in the JVM to do the right thing.
Close: it is not exactly UTF-16, but something like UCS-2; but either way it does use 2 bytes for most characters (and sequence of 2 chars, i.e. 4 bytes for some rarely used code points).
ObjectOutputStream uses something called modified UTF-8, which is like UTF-8 but where zero character is expressed as 2-byte sequence which is not legal as per UTF-8 (due to uniqueness restrictions of encoding), but that sort of naturally decodes back to value 0.
But what you are really asking is "does it work so that I write a String, read a String" -- and answer to that is yes. JDK does proper encoding when writing bytes out, and decoding when reading.
For what it's worth, you are better of using "writeUTF()" method for Strings, since I think resulting output is bit more compact. but "writeObject()" also works, just needs bit more metadata.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With