I'm trying to write a regular expression in Java which removes all non-alphanumeric characters from a paragraph, except the spaces between the words.
This is the code I've written:
paragraphInformation = paragraphInformation.replaceAll("[^a-zA-Z0-9\s]", "");
However, the compiler gave me an error message pointing to the s saying it's an illegal escape character. The program compiled OK before I added the \s to the end of the regular expression, but the problem with that was that the spaces between words in the paragraph were stripped out.
How can I fix this error?
A common solution to remove all non-alphanumeric characters from a String is with regular expressions. The idea is to use the regular expression [^A-Za-z0-9] to retain only alphanumeric characters in the string. You can also use [^\w] regular expression, which is equivalent to [^a-zA-Z_0-9] .
To remove all non-alphanumeric characters from a string, call the replace() method, passing it a regular expression that matches all non-alphanumeric characters as the first parameter and an empty string as the second. The replace method returns a new string with all matches replaced.
The approach is to use the String. replaceAll method to replace all the non-alphanumeric characters with an empty string.
You need to double-escape the \
character: "[^a-zA-Z0-9\\s]"
Java will interpret \s
as a Java String escape character, which is indeed an invalid Java escape. By writing \\
, you escape the \
character, essentially sending a single \
character to the regex. This \
then becomes part of the regex escape character \s
.
You need to escape the \ so that the regular expression recognizes \s :
paragraphInformation = paragraphInformation.replaceAll("[^a-zA-Z0-9\\s]", "");
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With