Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ruby 1.9.2 Character Encoding: invalid multibyte character: /?/

Tags:

ruby

encoding

I'm trying to understand why this snippet of code does not work in Ruby 1.9.2 I'm also trying to figure out how it should be changed to be made to work. Here is the snippet:

ruby-1.9.2-p290 :009 > str = "hello world!"
 => "hello world!" 
ruby-1.9.2-p290 :010 > str.gsub("\223","")
RegexpError: invalid multibyte character: /?/
    from (irb):10:in `gsub'
like image 936
dnstevenson Avatar asked Jan 18 '23 05:01

dnstevenson


1 Answers

Your ruby is in UTF-8 mode but "\223" is not a valid UTF-8 string. When you're in UTF-8, any byte with the eighth bit set means that you're within a multi-byte character and you need to keep reading more bytes to get the full character; that means that "\223" is just part of a UTF-8 encoded character, hence your error.

0223 and 0224 (147 and 148 decimal) are "smart" quotes in the Windows-1252 character set but Windows-1252 isn't UTF-8. In UTF-8 you want "\u201c" and "\u201d" for the quotes:

>> puts "\u201c"
“
>> puts "\u201d"
”

So if you're trying to strip out the quotes then you probably want one of these:

str.gsub("\u201c", "").gsub("\u201d", "")
str.gsub(/[\u201c\u201d]/, '')
like image 85
mu is too short Avatar answered Jan 29 '23 15:01

mu is too short