I seems to be a very simple and much needed method. I need to remove all non ASCII characters from a string. e.g © etc. See the following example.
#coding: utf-8
s = " Hello this a mixed string © that I made."
puts s.encoding
puts s.encode
output:
UTF-8
Hello this a mixed str
ing © that I made.
When I feed this to Watir, it produces following error:incompatible character encodings: UTF-8 and ASCII-8BIT
So my problem is that I want to get rid of all non ASCII characters before using it. I will not know which encoding the source string "s" uses.
I have been searching and experimenting for quite some time now.
If I try to use
puts s.encode('ASCII-8BIT')
It gives the error:
: "\xC2\xA9" from UTF-8 to ASCII-8BIT (Encoding::UndefinedConversionError)
Use . replace() method to replace the Non-ASCII characters with the empty string.
In Ruby, we can permanently delete characters from a string by using the string. delete method. It returns a new string with the specified characters removed.
The regular expression represents all printable ASCII characters. ASCII code is the numerical representation of all the characters and the ASCII table extends from char NUL (Null) to DEL . The printable characters extend from CODE 32 (SPACE) to CODE 126 (TILDE[~]) .
You can just literally translate what you asked into a Regexp
. You wrote:
I want to get rid of all non ASCII characters
We can rephrase that a little bit:
I want to substitue all characters which don't thave the
ASCII
property with nothing
And that's a statement that can be directly expressed in a Regexp
:
s.gsub!(/\P{ASCII}/, '')
As an alternative, you could also use String#delete!
:
s.delete!("^\u{0000}-\u{007F}")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With