Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is UTF-8 the default encoding in Ruby v.2?

Tags:

ruby

encoding

Matz wrote in his book that in order to use UTF-8, you must add a coding comment on the first line of your script. He gives us an example:

# -*- coding: utf-8 -*-   # Specify Unicode UTF-8 characters

# This is a string literal containing a multibyte multiplication character
s = "2x2=4"

# The string contains 6 bytes which encode 5 characters
s.length        # => 5: Characters:  '2'   'x'   '2'   '='   '4'
s.bytesize      # => 6: Bytes (hex): 32   c3 97  32    3d    34 

When he invokes bytesize, it returns 6 since the multiplication symbol × is outside the ascii set, and must be represented by unicode with the two bytes.

I tried the exercise and without specifying the coding comment, it recognized the multiplication symbol as two bytes:

'×'.encoding
 => #<Encoding:UTF-8> 
'×'.bytes.to_a.map {|dec| dec.to_s(16) }
 => ["c3", "97"] 

So it appears utf-8 is the default encoding. Is this a recent addition to Ruby 2? His examples were from Ruby 1.9.

like image 293
Donato Avatar asked Nov 01 '22 10:11

Donato


1 Answers

Yes. The fact that UTF-8 is the default encoding is only since Ruby 2.

If you are aware that his examples were from Ruby 1.9, then check the newly added features to the newer versions of Ruby. It is not that much.

like image 136
sawa Avatar answered Nov 15 '22 07:11

sawa