Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove any non-alphanumeric characters?

Tags:

java

regex

I want to remove any non-alphanumeric character from a string, except for certain ones.

StringUtils.replacePattern(input, "\\p{Alnum}", "");

How can I also exclude those certain characters, like .-;?

like image 681
membersound Avatar asked Feb 23 '15 16:02

membersound


3 Answers

Use the not operator ^:

[^a-zA-Z0-9.\-;]+

This means "match what is not these characters". So:

StringUtils.replacePattern(input, "[^a-zA-Z0-9.\\-;]+", "");

Don't forget to properly escape the characters that need escaping: you need to use two backslashes \\ because your regex is a Java string.

like image 151
mk. Avatar answered Sep 23 '22 19:09

mk.


You could negate your expression;

\p{Alnum}

By placing it in a negative character class:

[^\p{Alnum}]

That will match any non-alpha numeric characters, you could then replace those with "". if you wanted to allow additional characters you can just append them to the character class, e.g.:

[^\p{Alnum}\s]

will not match white space characters (\s).

If you where to replace

[^\p{Alnum}.;-]

with "", these characters will also be allowed: ., ; or -.

like image 39
PeterK Avatar answered Sep 20 '22 19:09

PeterK


StringUtils uses Java's standard Pattern class under the hood. If you don't want to import Apache's library and want it to run quicker (since it doesn't have to compile the regex each time it's used) you could do:

private static final Pattern NO_ODD_CHARACTERS = Pattern.compile("[^a-zA-Z0-9.\\-;]+");

...

String cleaned = NO_ODD_CHARACTERS.matcher(input).replaceAll("");
like image 44
Mark Rhodes Avatar answered Sep 22 '22 19:09

Mark Rhodes