replaceAll() method. A common solution to remove all non-alphanumeric characters from a String is with regular expressions. The idea is to use the regular expression [^A-Za-z0-9] to retain only alphanumeric characters in the string. You can also use [^\w] regular expression, which is equivalent to [^a-zA-Z_0-9] .
Alphanumeric characters by definition only comprise the letters A to Z and the digits 0 to 9. Spaces and underscores are usually considered punctuation characters, so no, they shouldn't be allowed. If a field specifically says "alphanumeric characters, space and underscore", then they're included.
Short example re. sub(r'\W+', '_', 'bla: bla**(bla)') replaces one or more consecutive non-alphanumeric characters by an underscore.
Be aware, that \W
leaves the underscore. A short equivalent for [^a-zA-Z0-9]
would be [\W_]
text.replace(/[\W_]+/g," ");
\W
is the negation of shorthand \w
for [A-Za-z0-9_]
word characters (including the underscore)
Example at regex101.com
Jonny 5 beat me to it. I was going to suggest using the \W+
without the \s
as in text.replace(/\W+/g, " ")
. This covers white space as well.
Since [^a-z0-9]
character class contains all that is not alnum, it contains white characters too!
text.replace(/[^a-z0-9]+/gi, " ");
Well I think you just need to add a quantifier to each pattern. Also the carriage-return thing is a little funny:
text.replace(/[^a-z0-9]+|\s+/gmi, " ");
edit The \s
thing matches \r
and \n
too.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With