Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove all characters from string which are not on whitelist

I am trying to write java code which would remove all unwanted characters and let there be only whitelisted ones.

Example:

String[] whitelist = {"a", "b", "c"..."z", "0"..."9", "[", "]",...}

I want there only letters (lower and uppercase) and numbers + some next characters I would add. Then I would start for() cycle for every character in the string, and replace it with empty string if it isn't on whitelist.

But that isn't good solution. Maybe it could be done somehow using pattern (regex)? Thanks.

like image 695
PerwinCZ Avatar asked Mar 06 '13 13:03

PerwinCZ


1 Answers

Yes, you can use String.replaceAll which takes a regex:

String input = "BAD good {} []";
String output = input.replaceAll("[^a-z0-9\\[\\]]", "");
System.out.println(output); // good[]

Or in Guava you could use a CharMatcher:

CharMatcher matcher = CharMatcher.inRange('a', 'z')
                          .or(CharMatcher.inRange('0', '9'))
                          .or(CharMatcher.anyOf("[]"));
String input = "BAD good {} []";
String output = matcher.retainFrom(input);

That just shows the lower case version, making it easier to demonstrate. To include upper case letters, use "[^A-Za-z0-9\\[\\]]" in the regex (and any other symbols you want) - and for the CharMatcher you can or it with CharMatcher.inRange('A', 'Z').

like image 116
Jon Skeet Avatar answered Oct 17 '22 14:10

Jon Skeet