Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Socket InputStream and UTF-8

I'm trying to make a chat with Java. Everything works fine, except that special characters doesn't work. I think that it's an encoding problem because in my Outputstream I encode the string in UTF-8 like this:

  protected void send(String msg) {
    
        try {
          msg+="\r\n";            
          OutputStream outStream = socket.getOutputStream();              
          outStream.write(msg.getBytes("UTF-8"));
          System.out.println(msg.getBytes("UTF-8"));
          outStream.flush();
        }
        catch(IOException ex) {
          ex.printStackTrace();
        }
      }

But in my receive method I didn't find a way to do this:

public String receive() throws IOException {
   
    String line = "";
    InputStream inStream = socket.getInputStream();    
                
    int read = inStream.read();
    while (read!=10 && read > -1) {
      line+=String.valueOf((char)read);
      read = inStream.read();
    }
    if (read==-1) return null;
    line+=String.valueOf((char)read);       
    return line; 
    
  }

So there is a quick way to specify that the bytes read by the buffer are encoded with UTF-8?

EDIT: Okay, I tried with the BufferedReader like this:

 public String receive() throws IOException {
    
    String line = "";           
    in = new BufferedReader(new InputStreamReader(socket.getInputStream(), "UTF-8"));           
    String readLine = "";   
    
    while ((readLine = in.readLine()) != null) {
        line+=readLine;
    }
    
    System.out.println("Line:"+line);
    
    return line;
   
  }

But it doesn't work. It seems that the socket doesn't receive anything.

like image 276
Davide Rain Avatar asked Dec 05 '22 06:12

Davide Rain


1 Answers

Trying to throw more light for future visitors.

Rule of thumb: Server and client HAS TO sync between encoding scheme, because if client is sending data encoded using some encoding scheme and server is reading the data using other encoding scheme, then exepcted results can NEVER be achieved.

Important thing to note for the folks who try to test this is that do not encoded in ASCII at client side (or in other words using ASCII encoding at client side) and decode using UTF8 at server side (or in other words using UTF8 encoding at server side) because UTF8 is backward compatible with ASCII, so may feel that "Rule of thumb" is wrong, but no, its not, so better use UTF8 at client side and UTF16 at server side and you will understand.

Encoding with sockets

I guess single most important thing to understand is: finally over the socket you are going to send BYTES but it all depends how those bytes are encoded.

For example, if I send input to server (over client-server socket) using my windows command prompt then the data will be encoded using some encoding scheme (I really do not know which), and if I send data to server using another client code/program then I can specify the encoding scheme which I want to use for my client socket’s o/p stream, and then all the data will be converted/encoded into BYTES using that encoding scheme and sent over the socket.

Now, finally I am still sending the BYTES over the wire but those are encoded using the encoding scheme which I specified. And if suppose at server side, I use another encoding scheme while reading over the socket’s i/p stream then expected results cannot be achieved, and if I use same encoding scheme (same as client’s encoding scheme) at server as well then everything will be perfect.

Answering this question

In Java, there are special "bridge" streams (read here) which you can use to specify encoding of the stream.

PLEASE NOTE: in Java InputStream and OutputStream are BYTE streams, so everything read from and written into using these streams will be BYTES, you cannot specify encoding using objects of InputStream and OutputStream classes, so you can use Java bridge classes.

Below is the code snippet of client and server, where I am trying to show how to specify encoding over the client's output stream and server's input stream.

As long as I specify same encoding at both end, everything will be perfect.

Client side:

        Socket clientSocket = new Socket("abc.com", 25050);
        OutputStreamWriter clientSocketWriter = (new OutputStreamWriter(clientSocket.getOutputStream(), "UTF8"));

Server side:

    ServerSocket serverSocket = new ServerSocket(8001);
    Socket clientSocket = serverSocket.accept();
    // PLEASE NOTE: important thing below is I am specifying the encoding over my socket's input stream, and since Java's <<InputStream>> is a BYTE stream,  
    // so in order to specify the encoding I am using Java I/O's bridge class <<InputStreamReader>> and specifying my UTF8 encoding.
    // So, with this all my data (BYTES really) will be read from client socket as bytes "BUT" those will be read as UTF8 encoded bytes.
    // Suppose if I specify different encoding here, than what client is specifying in its o/p stream than data cannot read properly and may be all "?"
    InputStreamReader clientSocketReader = (new InputStreamReader(clientSocket.getInputStream(), "UTF8"));
like image 123
hagrawal Avatar answered Dec 11 '22 12:12

hagrawal