How do I delete non-UTF8 characters from a ruby string? I have a string that has for example "xC2" in it. I want to remove that char from the string so that it becomes a valid UTF8.
This:
text.gsub!(/\xC2/, '')
returns an error:
incompatible encoding regexp match (ASCII-8BIT regexp with UTF-8 string)
I was looking at text.unpack('U*') and string.pack as well, but did not get anywhere.
You can use encode for that. text.encode('UTF-8', :invalid => :replace, :undef => :replace)
For more info look into Ruby-Docs
You could do it like this
# encoding: utf-8 class String def validate_encoding chars.select(&:valid_encoding?).join end end puts "testing\xC2 a non UTF-8 string".validate_encoding #=>testing a non UTF-8 string
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With