What is the best way to strip all non alphanumeric characters from a string, using Python?
The solutions presented in the PHP variant of this question will probably work with some minor adjustments, but don't seem very 'pythonic' to me.
For the record, I don't just want to strip periods and commas (and other punctuation), but also quotes, brackets, etc.
A common solution to remove all non-alphanumeric characters from a String is with regular expressions. The idea is to use the regular expression [^A-Za-z0-9] to retain only alphanumeric characters in the string. You can also use [^\w] regular expression, which is equivalent to [^a-zA-Z_0-9] .
Use the isalnum() Method to Remove All Non-Alphanumeric Characters in Python String. We can use the isalnum() method to check whether a given character or string is alphanumeric or not. We can compare each character individually from a string, and if it is alphanumeric, then we combine it using the join() function.
isalnum() is a built-in Python function that checks whether all characters in a string are alphanumeric. In other words, isalnum() checks whether a string contains only letters or numbers or both. If all characters are alphanumeric, isalnum() returns the value True ; otherwise, the method returns the value False .
I just timed some functions out of curiosity. In these tests I'm removing non-alphanumeric characters from the string string.printable
(part of the built-in string
module). The use of compiled '[\W_]+'
and pattern.sub('', str)
was found to be fastest.
$ python -m timeit -s \ "import string" \ "''.join(ch for ch in string.printable if ch.isalnum())" 10000 loops, best of 3: 57.6 usec per loop $ python -m timeit -s \ "import string" \ "filter(str.isalnum, string.printable)" 10000 loops, best of 3: 37.9 usec per loop $ python -m timeit -s \ "import re, string" \ "re.sub('[\W_]', '', string.printable)" 10000 loops, best of 3: 27.5 usec per loop $ python -m timeit -s \ "import re, string" \ "re.sub('[\W_]+', '', string.printable)" 100000 loops, best of 3: 15 usec per loop $ python -m timeit -s \ "import re, string; pattern = re.compile('[\W_]+')" \ "pattern.sub('', string.printable)" 100000 loops, best of 3: 11.2 usec per loop
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With