replaceAll() method. A common solution to remove all non-alphanumeric characters from a String is with regular expressions. The idea is to use the regular expression [^A-Za-z0-9] to retain only alphanumeric characters in the string. You can also use [^\w] regular expression, which is equivalent to [^a-zA-Z_0-9] .
Using Regular Expression We can use the regular expression [^a-zA-Z0-9] to identify non-alphanumeric characters in a string. Replace the regular expression [^a-zA-Z0-9] with [^a-zA-Z0-9 _] to allow spaces and underscore character.
The approach is to use the String. replaceAll method to replace all the non-alphanumeric characters with an empty string.
Short example re. sub(r'\W+', '_', 'bla: bla**(bla)') replaces one or more consecutive non-alphanumeric characters by an underscore.
Regex to the rescue!
import re
s = re.sub('[^0-9a-zA-Z]+', '*', s)
Example:
>>> re.sub('[^0-9a-zA-Z]+', '*', 'h^&ell`.,|o w]{+orld')
'h*ell*o*w*orld'
The pythonic way.
print "".join([ c if c.isalnum() else "*" for c in s ])
This doesn't deal with grouping multiple consecutive non-matching characters though, i.e.
"h^&i => "h**i
not "h*i"
as in the regex solutions.
Try:
s = filter(str.isalnum, s)
in Python3:
s = ''.join(filter(str.isalnum, s))
Edit: realized that the OP wants to replace non-chars with '*'. My answer does not fit
Use \W
which is equivalent to [^a-zA-Z0-9_]
. Check the documentation, https://docs.python.org/2/library/re.html
import re
s = 'h^&ell`.,|o w]{+orld'
replaced_string = re.sub(r'\W+', '*', s)
output: 'h*ell*o*w*orld'
update: This solution will exclude underscore as well. If you want only alphabets and numbers to be excluded, then solution by nneonneo is more appropriate.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With