Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Stripping everything but alphanumeric chars from a string in Python

What is the best way to strip all non alphanumeric characters from a string, using Python?

The solutions presented in the PHP variant of this question will probably work with some minor adjustments, but don't seem very 'pythonic' to me.

For the record, I don't just want to strip periods and commas (and other punctuation), but also quotes, brackets, etc.

like image 347
Mark van Lent Avatar asked Aug 14 '09 08:08

Mark van Lent


People also ask

How do you remove everything except alphanumeric characters from a string?

A common solution to remove all non-alphanumeric characters from a String is with regular expressions. The idea is to use the regular expression [^A-Za-z0-9] to retain only alphanumeric characters in the string. You can also use [^\w] regular expression, which is equivalent to [^a-zA-Z_0-9] .

How do you remove all alphanumeric characters from a string in Python?

Use the isalnum() Method to Remove All Non-Alphanumeric Characters in Python String. We can use the isalnum() method to check whether a given character or string is alphanumeric or not. We can compare each character individually from a string, and if it is alphanumeric, then we combine it using the join() function.

How do you find non alphanumeric characters in Python?

isalnum() is a built-in Python function that checks whether all characters in a string are alphanumeric. In other words, isalnum() checks whether a string contains only letters or numbers or both. If all characters are alphanumeric, isalnum() returns the value True ; otherwise, the method returns the value False .


1 Answers

I just timed some functions out of curiosity. In these tests I'm removing non-alphanumeric characters from the string string.printable (part of the built-in string module). The use of compiled '[\W_]+' and pattern.sub('', str) was found to be fastest.

$ python -m timeit -s \      "import string" \      "''.join(ch for ch in string.printable if ch.isalnum())"  10000 loops, best of 3: 57.6 usec per loop  $ python -m timeit -s \     "import string" \     "filter(str.isalnum, string.printable)"                  10000 loops, best of 3: 37.9 usec per loop  $ python -m timeit -s \     "import re, string" \     "re.sub('[\W_]', '', string.printable)" 10000 loops, best of 3: 27.5 usec per loop  $ python -m timeit -s \     "import re, string" \     "re.sub('[\W_]+', '', string.printable)"                 100000 loops, best of 3: 15 usec per loop  $ python -m timeit -s \     "import re, string; pattern = re.compile('[\W_]+')" \     "pattern.sub('', string.printable)"  100000 loops, best of 3: 11.2 usec per loop 
like image 52
Otto Allmendinger Avatar answered Oct 06 '22 17:10

Otto Allmendinger