Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Removing non-alphanumeric characters without removing international characters in ruby

Tags:

regex

ruby

I want to remove non-alphanumeric character in a string, but not remove international characters, like accented letters. I also want to keep whitespace. Here is what I have so far:

the_string = the_string.gsub(/[^a-z0-9 -]/i, '')

This does remove international accented alpha characters though.

Solution that I used:

the_string = the_string.gsub(/[^\p{Alnum}\p{Space}-]/u, '')

It works! Thanks.

like image 682
Kevin K Avatar asked Feb 11 '14 19:02

Kevin K


1 Answers

You can use character properties to do this:

the_string.gsub(/[^\p{Alnum} -]/, '')

You may also want to use \p{Space} to keep other whitespace such as non-breaking spaces etc.:

the_string.gsub(/[^\p{Alnum}\p{Space}-]/, '')

(This also keeps the - character, which you have in your regexp.)

like image 180
matt Avatar answered Oct 24 '22 23:10

matt