Remove all non-"word characters" from a String in Java, leaving accented characters?

Tags:

Apparently Java's Regex flavor counts Umlauts and other special characters as non-"word characters" when I use Regex.

        "TESTÜTEST".replaceAll( "\\W", "" )

returns "TESTTEST" for me. What I want is for only all truly non-"word characters" to be removed. Any way to do this without having something along the lines of

         "[^A-Za-z0-9äöüÄÖÜßéèáàúùóò]"

only to realize I forgot ô?

718

asked Oct 23 '09 08:10

Epaga

1 Answers

Use [^\p{L}\p{Nd}]+ - this matches all (Unicode) characters that are neither letters nor (decimal) digits.

In Java:

String resultString = subjectString.replaceAll("[^\\p{L}\\p{Nd}]+", "");

Edit:

I changed \p{N} to \p{Nd} because the former also matches some number symbols like ¼; the latter doesn't. See it on regex101.com.

answered Oct 29 '22 18:10

Tim Pietzcker

Related questions
                            
                                Create an ArrayList with multiple object types?
                            
                                Java Inheritance - calling superclass method
                            
                                Two dimensional array initializer followed by square brackets
                            
                                Spring Boot - Handle to Hibernate SessionFactory
                            
                                getLocationOnScreen() vs getLocationInWindow()
                            
                                How to specify two Fields in Lucene QueryParser?
                            
                                BigDecimal to string
                            
                                I need to convert an int variable to double [duplicate]
                            
                                Can a normal Class implement multiple interfaces?
                            
                                Java: for(;;) vs. while(true)
                            
                                JPQL Create new Object In Select Statement - avoid or embrace?
                            
                                Why doesn't Java throw an Exception when dividing by 0.0?
                            
                                Create list of object from another using Java 8 Streams
                            
                                Tomcat 8 is not able to handle get request with '|' in query parameters?
                            
                                Time: How to get the next friday?
                            
                                How to convert a String to a Date using SimpleDateFormat?
                            
                                Does Java have an exponential operator?
                            
                                How does OkHttp get Json string?
                            
                                RxJava: How to convert List of objects to List of another objects
                            
                                Spring Boot: SpringBootServletInitializer is deprecated

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Remove all non-"word characters" from a String in Java, leaving accented characters?

Tags:

java

string

regex

Epaga

People also ask

1 Answers

Tim Pietzcker

Recent Activity

Donate For Us