Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove all non - ASCII characters from a string in Ruby

Tags:

ruby

watir

I seems to be a very simple and much needed method. I need to remove all non ASCII characters from a string. e.g © etc. See the following example.

#coding: utf-8
s = " Hello this a mixed string © that I made."
puts s.encoding
puts s.encode

output:

UTF-8
Hello this a mixed str

ing © that I made.

When I feed this to Watir, it produces following error:incompatible character encodings: UTF-8 and ASCII-8BIT

So my problem is that I want to get rid of all non ASCII characters before using it. I will not know which encoding the source string "s" uses.

I have been searching and experimenting for quite some time now.

If I try to use

  puts s.encode('ASCII-8BIT')

It gives the error:

 : "\xC2\xA9" from UTF-8 to ASCII-8BIT (Encoding::UndefinedConversionError)
like image 787
Nick Avatar asked Jul 08 '10 04:07

Nick


People also ask

How do I remove non ascii characters from a string?

Use . replace() method to replace the Non-ASCII characters with the empty string.

How do I remove special characters from a string in Ruby?

In Ruby, we can permanently delete characters from a string by using the string. delete method. It returns a new string with the specified characters removed.

Does regex use ASCII?

The regular expression represents all printable ASCII characters. ASCII code is the numerical representation of all the characters and the ASCII table extends from char NUL (Null) to DEL . The printable characters extend from CODE 32 (SPACE) to CODE 126 (TILDE[~]) .


1 Answers

You can just literally translate what you asked into a Regexp. You wrote:

I want to get rid of all non ASCII characters

We can rephrase that a little bit:

I want to substitue all characters which don't thave the ASCII property with nothing

And that's a statement that can be directly expressed in a Regexp:

s.gsub!(/\P{ASCII}/, '')

As an alternative, you could also use String#delete!:

s.delete!("^\u{0000}-\u{007F}")
like image 147
Jörg W Mittag Avatar answered Oct 26 '22 11:10

Jörg W Mittag