Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove all non-alphabetical, non-numerical characters from a string?

Tags:

string

regex

ruby

If I wanted to remove things like: .!,'"^-# from an array of strings, how would I go about this while retaining all alphabetical and numeric characters.

Allowed alphabetical characters should also include letters with diacritical marks including à or ç.

like image 505
Melanie Shebel Avatar asked Feb 12 '12 03:02

Melanie Shebel


People also ask

How can you remove all non-alphanumeric characters from a string?

A common solution to remove all non-alphanumeric characters from a String is with regular expressions. The idea is to use the regular expression [^A-Za-z0-9] to retain only alphanumeric characters in the string. You can also use [^\w] regular expression, which is equivalent to [^a-zA-Z_0-9] .

How do I remove all non alphabetic characters in a string python?

Use the isalnum() Method to Remove All Non-Alphanumeric Characters in Python String. We can use the isalnum() method to check whether a given character or string is alphanumeric or not. We can compare each character individually from a string, and if it is alphanumeric, then we combine it using the join() function.

How do you remove all non-alphanumeric characters from a string excel?

Select the range that you need to remove non-alphanumeric characters from, and click Kutools > Text > Remove Characters. 2. Then a Delete Characters dialog box will appear, only check Non-alphanumeric option, and click the Ok button. Now all of the non-alphanumeric characters have been deleted from the text strings.

How do you replace non-alphanumeric characters with an empty string?

The approach is to use the String. replaceAll method to replace all the non-alphanumeric characters with an empty string.


4 Answers

You should use a regex with the correct character property. In this case, you can invert the Alnum class (Alphabetic and numeric character):

"◊¡ Marc-André !◊".gsub(/\p{^Alnum}/, '') # => "MarcAndré"

For more complex cases, say you wanted also punctuation, you can also build a set of acceptable characters like:

"◊¡ Marc-André !◊".gsub(/[^\p{Alnum}\p{Punct}]/, '') # => "¡MarcAndré!"

For all character properties, you can refer to the doc.

like image 176
Marc-André Lafortune Avatar answered Nov 15 '22 06:11

Marc-André Lafortune


string.gsub(/[^[:alnum:]]/, "")
like image 23
Jeremy Roman Avatar answered Nov 15 '22 08:11

Jeremy Roman


The following will work for an array:

z = ['asfdå', 'b12398!', 'c98347']
z.each { |s| s.gsub! /[^[:alnum:]]/, '' }
puts z.inspect

I borrowed Jeremy's suggested regex.

like image 26
kikuchiyo Avatar answered Nov 15 '22 06:11

kikuchiyo


You might consider a regular expression.

http://www.regular-expressions.info/ruby.html

I'm assuming that you're using ruby since you tagged that in your post. You could go through the array, put it through a test using a regexp, and if it passes remove/keep it based on the regexp you use.

A regexp you might use might go something like this:

[^.!,^-#]

That will tell you if its not one of the characters inside the brackets. However, I suggest that you look up regular expressions, you might find a better solution once you know their syntax and usage.

like image 20
Student Avatar answered Nov 15 '22 07:11

Student