Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java regular expression to remove all non alphanumeric characters EXCEPT spaces

Tags:

java

regex

I'm trying to write a regular expression in Java which removes all non-alphanumeric characters from a paragraph, except the spaces between the words.

This is the code I've written:

paragraphInformation = paragraphInformation.replaceAll("[^a-zA-Z0-9\s]", ""); 

However, the compiler gave me an error message pointing to the s saying it's an illegal escape character. The program compiled OK before I added the \s to the end of the regular expression, but the problem with that was that the spaces between words in the paragraph were stripped out.

How can I fix this error?

like image 947
Victoria Avatar asked Aug 03 '12 13:08

Victoria


People also ask

How do you remove non alphabetic characters in Java?

A common solution to remove all non-alphanumeric characters from a String is with regular expressions. The idea is to use the regular expression [^A-Za-z0-9] to retain only alphanumeric characters in the string. You can also use [^\w] regular expression, which is equivalent to [^a-zA-Z_0-9] .

How do you remove non-alphanumeric characters?

To remove all non-alphanumeric characters from a string, call the replace() method, passing it a regular expression that matches all non-alphanumeric characters as the first parameter and an empty string as the second. The replace method returns a new string with all matches replaced.

How do you replace non-alphanumeric characters with empty strings?

The approach is to use the String. replaceAll method to replace all the non-alphanumeric characters with an empty string.


2 Answers

You need to double-escape the \ character: "[^a-zA-Z0-9\\s]"

Java will interpret \s as a Java String escape character, which is indeed an invalid Java escape. By writing \\, you escape the \ character, essentially sending a single \ character to the regex. This \ then becomes part of the regex escape character \s.

like image 79
jqno Avatar answered Sep 21 '22 04:09

jqno


You need to escape the \ so that the regular expression recognizes \s :

paragraphInformation = paragraphInformation.replaceAll("[^a-zA-Z0-9\\s]", ""); 
like image 30
NominSim Avatar answered Sep 23 '22 04:09

NominSim