I'm confused by some ruby behavior. Look at the following code:
[127].pack("C") == "\x7f" # => true
This makes sense. Now:
[128].pack("C") # => "\x80"
"\x80" # => "\x80"
[128].pack("C") == "\x80" # => false
The pack option "C" stands for 8-bit unsigned (unsigned char)
, which should be fine to store a value of 128
. Also both strings print the same thing, so why are they not equal? Does this have something to do with encoding stuff?
I'm on ruby 2.0.0p247.
It is false because the encodings differ:
[128].pack("C").encoding
#=> #<Encoding:ASCII-8BIT>
"\x80".encoding
#=> #<Encoding:UTF-8>
(using ruby 2.0.0p247 (2013-06-27 revision 41674) [x86_64-linux]
)
In ruby 2.0 the default encoding for strings is UTF-8, but somehow pack
returns an ASCII 8-Bit encoded string.
[127].pack('C') == "\x79"
true then?However, [127].pack('C') == "\x79"
is true
, because for the code points 0
to 127
ASCII and UTF-8 do not differ. This is considered by ruby's string comparison (have a look at the rubinius source code):
def ==(other)
[...]
return false unless @num_bytes == other.bytesize
return false unless Encoding.compatible?(self, other)
return @data.compare_bytes(other.__data__, @num_bytes, other.bytesize) == 0
end
The mri c-source is similar, but harder to understand.
We observe, that the comparison checks for a compatible encoding. Let's try that:
Encoding.compatible?([127].pack("C"), "\x79") #=> #<Encoding:ASCII-8BIT>
Encoding.compatible?([128].pack("C"), "\x80") #=> nil
We see that beginning with code point 128 the comparison returns false
even when both strings are made of the same bytes.
In Ruby 1.9, the default source file encoding is US-ASCII
. While starting from Ruby 2.0, the default encoding has changed to UTF-8
. String literals like "\x80"
are always encoded using the encoding of the source file that contains them.
However, the encoding of [128].pack("C")
is ASCII-8BIT
.
So [128].pack("C") == "\x80"
is false
in Ruby 2.0 while true
in Ruby 1.9
Putting #coding:some_encoding
in the first line of source file (or just after the shebang) can change the default source code encoding.
#coding:ascii
puts([128].pack("C") == "\x80")
Output true
in Ruby 2.0 as well.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With