I am trying to read data from a binary stream, portions of which should be parsed as UTF-8.
Using the InputStream
directly for the binary data and an InputStreamReader
on top of it for the UTF-8 text does not work as the reader will read ahead and mess up the subsequent binary data even if it is told to read a maximum of n
characters.
I recognize this question is very similar to Read from InputStream in multiple formats, but the solution proposed there is specific to HTTP streams, which does not help me.
I thought of just reading everything as binary data and converting the relevant pieces to text afterwards. But I only have the length information of the character data in characters, not in bytes. Thus, I need the thing which reads characters from the stream to be aware of the encoding.
Is there a way to tell InputStreamReader not to read ahead further than is needed for reading the given number of characters? Or is there a reader that supports both binary data and text with an encoding and can be switched between these modes on the fly?
I think that you just should not use StreamReader. Readers deal with text but you deal with text and binary data together.
There is no way. You have to read binary buffers and interpret your format yourself, i.e. find the position of text extract bytes and transform them to String.
To simplify this task I'd recommend you to create your own class (let's say ProtocolRecord.) It should be Serializable. It will contain all your fields. Now you have 2 options:
(1) simple one - use the java serialization mechanism. In this case you just have to wrap your stream with DataInputStream for reading and DataOutputStream for writing and then read/write your objects. The disadvantage of this approach is that you cannot control your protocol.
(2) implement methods readObject() and writeObject() yourself. Now use DataInputStream and DataOutputStream as explained above. In this case you do have to implement the serialization protocol but at least it is encapsulated into your class.
It think that DataInputStream is what you need.
You need to read the binary portions first. Where you recognise a portion of bytes which need UTF-8 decoding you need to extract those bytes and decode it.
DataInputStream dis =
// read a binary type.
int num = dis.readInt();
int len = dis.readUnsignedShort();
// read a UTF-8 portion.
byte[] bytes = new byte[len];
dis.readFully(bytes);
String text = new String(bytes, "UTF-8");
// read some binary
double d = dis.readDouble();
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With