Why are two strings with same bytes and encoding not identical in Ruby 1.9?

Question

In Ruby 1.9.2, I found a way to make two strings that have the same bytes, same encoding, and are equal, but they have a different length and different characters returned by [].

Is this a bug? If it is not a bug, then I'd like to fully understand it. What kind of information is stored inside Ruby 1.9.2 String objects that allows these two strings to behave differently?

Below is the code that reproduces this behavior. The comments that start with #=> show you what output I am getting from this script, and the parenthetical words tell you my judgment of that output.

#!/usr/bin/ruby1.9
# coding: utf-8
string1 = "\xC2\xA2"       # A well-behaved string with one character (¢)
string2 = "".concat(0xA2)  # A bizarre string very similar to string1.
p    string1.bytes.to_a    #=> [194, 162]  (good)
p    string2.bytes.to_a    #=> [194, 162]  (good)
puts string1.encoding.name #=> UTF-8  (good)
puts string2.encoding.name #=> UTF-8  (good)
puts string1 == string2    #=> true   (good)
puts string1.length        #=> 1      (good)
puts string2.length        #=> 2      (weird!)
p    string1[0]            #=> "¢"    (good)
p    string2[0]            #=> "\xC2" (weird!)

I am running Ubuntu and compiled Ruby from source. My Ruby version is:

ruby 1.9.2p0 (2010-08-18 revision 29036) [x86_64-linux]

naruse · Accepted Answer

It is Ruby's bug and fixed r29848.

Matt Aimonetti · Answer

Matz mentioned this question via Twitter:

http://twitter.com/matz_translator/status/6597021662187520

http://twitter.com/matz_translator/status/6597055132733440

"It's hard to determine as a bug but, it's not acceptable to leave it as is. I'd prefer to fix this issue."

Why are two strings with same bytes and encoding not identical in Ruby 1.9?

Tags:

string

ruby

encoding

ruby-1.9

David Grayson

2 Answers

naruse

Matt Aimonetti

Recent Activity

Donate For Us

Why are two strings with same bytes and encoding not identical in Ruby 1.9?

Tags:

string

ruby

encoding

ruby-1.9

David Grayson

2 Answers

naruse

Matt Aimonetti

Related questions

Recent Activity

Donate For Us