Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java socket bug on linux (0xFF sent, -3 received)

While working on a WebSocket server in Java I came across this strange bug. I've reduced it down to two small java files, one is the server, the other is the client. The client simply sends 0x00, the string Hello and then 0xFF (per the WebSocket specification).

On my windows machine, the server prints the following:

Listening
byte: 0
72 101 108 108 111 recieved: 'Hello'

While on my unix box the same code prints the following:

Listening
byte: 0
72 101 108 108 111 -3

Instead of receiving 0xFF it gets -3, never breaks out of the loop and never prints what it has received.

The important part of the code looks like this:

byte b = (byte)in.read();
System.out.println("byte: "+b);

StringBuilder input = new StringBuilder();
b = (byte)in.read();
while((b & 0xFF) != 0xFF){
 input.append((char)b);
 System.out.print(b+" ");
 b = (byte)in.read();
}
inputLine = input.toString();

System.out.println("recieved: '" + inputLine+"'");
if(inputLine.equals("bye")){
 break;
}

I've also uploaded the two files to my server:

  • Server.java
  • Client.java

My Windows machine is running windows 7 and my Linux machine is running Debian

Edit:
When b is an int, it still acts strange. I send 0xFF (255) but receive 65533 (not 65535 or 255).

like image 838
Marius Avatar asked Dec 07 '22 03:12

Marius


2 Answers

The problem isn't in the code you've shown. It's here:

in = new BufferedReader(new InputStreamReader(socket.getInputStream()));

You're dealing with binary data so you should be using the raw stream - don't turn it into a Reader, which is meant for reading characters.

You're receiving 65533 because that's the integer used for the "Unicode replacement character" used when a value can't be represented as a real Unicode character. The exact behaviour of your current code will depend on the default character encoding on your system - which again isn't something you should rely on.

Further, you're assuming each byte should translate to a single character - essentially you're assuming ISO-8859-1. I haven't checked the spec, but I doubt that that's what you should be using.

Finally, you're not checking for b being -1 - which is used to indicate that the client has closed the stream.

like image 79
Jon Skeet Avatar answered Dec 11 '22 10:12

Jon Skeet


A different solution to Jon's above, simply define the charset as ISO-8859-1. By default Java uses UTF-8.

in = new BufferedReader(newInputStreamReader(kkSocket.getInputStream(),"ISO-8859-1"));

That way Java will interpret the bytes correctly as the characters that you intended them to be.

This is needed because 0xFF which is your final byte is an invalid char in UTF-8. The other option is to set the default charset for Java to use to ISO-8859-1. http://en.wikipedia.org/wiki/UTF-8#Codepage_layout

I remember when Java changed from throwing an exception to replacing the char with the replacement character (int 65533).

like image 20
Tristan Avatar answered Dec 11 '22 11:12

Tristan