Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read unicode characters from server socket

I need to receive a unicode (UTF-8) string sent by client on a server side. The length of the string is of course unknown.

ServerSocket serverSocket = new ServerSocket(567);
Socket clientSocket = serverSocket.accept();
PrintWriter out = new PrintWriter(clientSocket.getOutputStream(), true);
BufferedReader in = new BufferedReader(new InputStreamReader(clientSocket.getInputStream()));

I can read bytes using in.read() (until it returns -1) but the problem is that the string is unicode, in other words, every character is represented by two bytes. So converting the result of read() which would work with normal ascii characters makes no sense.

update

As per suggestions bello, I created the reader as follows:

BufferedReader in = new BufferedReader(new InputStreamReader(clientSocket.getInputStream(),"UTF-8"));

I've changed the client side to send a newline (#10#13) after each string. But the new problem is I get bullshit instead of real string if i call:

in.readLine();

And print the result I get some nonsense string (I cannot even copy it here) although I am not dealing with non-latin chars or anything else.

To see what's going on I introduced following code:

int j = 0
while (j < 255){
    j++;
   System.out.print(in.read()+", ");
}

So here I just print all bytes received. If I send "ab" I get:

97, 0, 98, 0, 10, 13, 

This is what one would expect, but than why the readLine method doesn't produce "good" results? Anyway, if we couldn't find the actual answer, I should probably collect the bytes (like above) and create my string from them? How to do that?

P.S. Just a quick note - I am on windows.

like image 800
Gonzalez Avatar asked Feb 08 '26 02:02

Gonzalez


2 Answers

Use new InputStreamReader(clientSocket.getInputStream(), "UTF-8") in order to set properly the name of the charset to use while reading the InputStream coming from your client

like image 61
Nicolas Filotto Avatar answered Feb 09 '26 17:02

Nicolas Filotto


When creating InputStreamReader you can set encoding like this:

BufferedReader in = 
     new BufferedReader(
         new InputStreamReader(clientSocket.getInputStream(), "UTF-8")
);
like image 38
user987339 Avatar answered Feb 09 '26 17:02

user987339



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!