I'm trying to understand why this snippet of code does not work in Ruby 1.9.2 I'm also trying to figure out how it should be changed to be made to work. Here is the snippet:
ruby-1.9.2-p290 :009 > str = "hello world!"
=> "hello world!"
ruby-1.9.2-p290 :010 > str.gsub("\223","")
RegexpError: invalid multibyte character: /?/
from (irb):10:in `gsub'
Your ruby is in UTF-8 mode but "\223"
is not a valid UTF-8 string. When you're in UTF-8, any byte with the eighth bit set means that you're within a multi-byte character and you need to keep reading more bytes to get the full character; that means that "\223"
is just part of a UTF-8 encoded character, hence your error.
0223 and 0224 (147 and 148 decimal) are "smart" quotes in the Windows-1252 character set but Windows-1252 isn't UTF-8. In UTF-8 you want "\u201c"
and "\u201d"
for the quotes:
>> puts "\u201c"
“
>> puts "\u201d"
”
So if you're trying to strip out the quotes then you probably want one of these:
str.gsub("\u201c", "").gsub("\u201d", "")
str.gsub(/[\u201c\u201d]/, '')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With