Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

java string split on all non-alphanumeric except apostrophes

Tags:

java

regex

So I want to split a string in java on any non-alphanumeric characters.

Currently I have been doing it like this

words= Str.split("\\W+"); 

However I want to keep apostrophes("'") in there. Is there any regular expression to preserve apostrophes but kick the rest of the junk? Thanks.

like image 925
Badmiral Avatar asked Jul 04 '12 16:07

Badmiral


People also ask

How do you remove everything except alphanumeric characters from a string?

A common solution to remove all non-alphanumeric characters from a String is with regular expressions. The idea is to use the regular expression [^A-Za-z0-9] to retain only alphanumeric characters in the string. You can also use [^\w] regular expression, which is equivalent to [^a-zA-Z_0-9] .

How do you split a string by non-alphanumeric?

To split the string on non-alphanumeric characters, you can use the special character \W , equivalent to [^a-zA-Z0-9_] .

What does split \\ s+ do in Java?

split("\\s+") will split the string into string of array with separator as space or multiple spaces. \s+ is a regular expression for one or more spaces.

How do you split a string at a certain character?

To split a string with specific character as delimiter in Java, call split() method on the string object, and pass the specific character as argument to the split() method. The method returns a String Array with the splits as elements in the array.


2 Answers

words = Str.split("[^\\w']+"); 

Just add it to the character class. \W is equivalent to [^\w], which you can then add ' to.

Do note, however, that \w also actually includes underscores. If you want to split on underscores as well, you should be using [^a-zA-Z0-9'] instead.

like image 75
Amber Avatar answered Sep 19 '22 19:09

Amber


For basic English characters, use

words = Str.split("[^a-zA-Z0-9']+"); 

If you want to include English words with special characters (such as fiancé) or for languages that use non-English characters, go with

words = Str.split("[^\\p{L}0-9']+"); 
like image 21
Ωmega Avatar answered Sep 21 '22 19:09

Ωmega