Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java InputStream automatically splits socket messages

I have a really strange behavior in Java and I can't tell whether this happens on purpose or by chance.

I do have a Socket Connection to Server that sends me a response to a request. I am reading this response from the Socket with the following loop, which is encapsulated in a try-with-resource.

BufferedInputStream remoteInput = new BufferedInputStream(remoteSocket.getInputStream())
final byte[] response = new byte[512];
int bytes_read;
while ((bytes_read = remoteInput.read(response,0,response.length)) != -1) {
    // Messageparsingstuff which does not affect the behaviour
}

According to my understanding the "read" Method fills as many bytes as possible into the byte Array. The limiting factors are either the amount of received bytes or the size of the array.

Unfortunately, this is not whats happening: the protocol I'm transmitting answers my request with several smaller answers which are sent one after another over the same socket connection.

In my case the "read" Method always returns with exactly one of those smaller answers in the array. The length of the answers varies but the 512 Byte that fit into the array are always enough. Which means my array always contains only one message and the rest/unneeded part of the array remains untouched.

If I intentionally define the byte-array smaller than my messages it will return several completely filled arrays and one last array that contains the rest of the bytes until the message is complete.

(A 100 byte answer with an array length of 30 returns three completely filled arrays and one with only 10 bytes used)

The InputStream or a socket connection in general shouldn't interpret the transmitted bytes in any way which is why I am very confused right now. My program is not aware of the used protocol in any way. In fact, my entire program is only this loop and the stuff you need to establish a socket connection.

If I can rely on this behavior it would make parsing the response extremely easy but since I do not know what causes this behavior in the first place I don't know whether I can count on it.

The protocol I'm transmitting is LDAP but since my program is completely unaware of that, that shouldn't matter.

like image 782
thaasoph Avatar asked Mar 10 '23 06:03

thaasoph


2 Answers

According to my understanding the "read" Method fills as many bytes as possible into the byte Array.

Your understanding is incorrect. The whole point of that method returning the "number of bytes read" is: it might return any number. And to be precise: when talking about a blocking read - when the method returns, it has read something; thus it will return a number >= 1.

In other words: you should never every rely on read() reading a specific amount of bytes. You always always always check the returned numbers; and if you are waiting for a certain value to be reached, then you have to do something about that in your code (like buffering again; until you got "enough" bytes in your own buffer to proceed).

Thing is: there is a whole, huge stack of elements involved in such read operations. Network, operating system, jvm. You can't control what exactly happens; and thus you can not and should not build any implicit assumptions into your code like this.

like image 108
GhostCat Avatar answered Mar 15 '23 21:03

GhostCat


While you might see this behaviour on a given machine, esp over loopback, once you start using real networks and use different hardware this can change.

If you send messages with enough of a delay, and read them fast enough, you will see one message at a time. However, if writing messages are sent close enough or your reader is delayed in any way, you can get multiple messages sent at once.

Also if you message is large enough e.g. around the MTU or more, a single message can be broken up even if your buffer is more than large enough.

like image 30
Peter Lawrey Avatar answered Mar 15 '23 23:03

Peter Lawrey