When a server sends UTF-8 bytes, how do you read them without characters becoming pure bytes? (\x40 etc)
UTF-8 is not a character set but an encoding used with Unicode. It happens to be compatible with ASCII too, because the codes used for multiple byte encodings lie in the part of the ASCII character set that is unused.
UTF-8 encodes a character into a binary string of one, two, three, or four bytes. UTF-16 encodes a Unicode character into a string of either two or four bytes. This distinction is evident from their names.
UTF-8 Basics. UTF-8 (Unicode Transformation–8-bit) is an encoding defined by the International Organization for Standardization (ISO) in ISO 10646. It can represent up to 2,097,152 code points (2^21), more than enough to cover the current 1,112,064 Unicode code points.
UTF-8 is a variable-width character encoding used for electronic communication. Defined by the Unicode Standard, the name is derived from Unicode (or Universal Coded Character Set) Transformation Format – 8-bit. UTF-8.
I believe read_nonblock
uses read
, which in turn says:
The resulted string is always ASCII-8BIT encoding.
Which means you don't need to specify IO#set_encoding
, but that you can, after you read whole string, force its encoding (using String#force_encoding!
) to UTF-8
.
I emphasized 'whole', as you need to make sure that you read entire Unicode character at the end of the string, as if only part of it is read, you will get invalid UTF-8 character and Ruby might complain about it further down the line.
You can use IO#set_encoding to set a socket's external encoding to UTF-8.
#!/usr/bin/env ruby
# -*- coding: utf-8 -*-
require 'socket'
server_socket = TCPServer.new('localhost', 0)
Thread.new do
loop do
session_socket = server_socket.accept
session_socket.set_encoding 'ASCII-8BIT'
session_socket.puts " ᚁ ᚂ ᚃ ᚄ ᚅ ᚆ ᚇ ᚈ ᚉ ᚊ ᚋ ᚌ ᚍ"
session_socket.close
end
end
client_socket = TCPSocket.new('localhost', server_socket.addr[1])
client_socket.set_encoding 'UTF-8'
p client_socket.gets
# => "| ᚁ ᚂ ᚃ ᚄ ᚅ ᚆ ᚇ ᚈ ᚉ ᚊ ᚋ ᚌ ᚍ\n"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With