Matz wrote in his book that in order to use UTF-8, you must add a coding comment on the first line of your script. He gives us an example:
# -*- coding: utf-8 -*- # Specify Unicode UTF-8 characters
# This is a string literal containing a multibyte multiplication character
s = "2x2=4"
# The string contains 6 bytes which encode 5 characters
s.length # => 5: Characters: '2' 'x' '2' '=' '4'
s.bytesize # => 6: Bytes (hex): 32 c3 97 32 3d 34
When he invokes bytesize
, it returns 6
since the multiplication symbol ×
is outside the ascii set, and must be represented by unicode with the two bytes.
I tried the exercise and without specifying the coding comment, it recognized the multiplication symbol as two bytes:
'×'.encoding
=> #<Encoding:UTF-8>
'×'.bytes.to_a.map {|dec| dec.to_s(16) }
=> ["c3", "97"]
So it appears utf-8 is the default encoding. Is this a recent addition to Ruby 2? His examples were from Ruby 1.9.
Yes. The fact that UTF-8 is the default encoding is only since Ruby 2.
If you are aware that his examples were from Ruby 1.9, then check the newly added features to the newer versions of Ruby. It is not that much.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With