I want to remove characters from a string other then a-z, and A-Z. Created following function for the same and it works fine.
public String stripGarbage(String s) {
String good = "ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnopqrstuvwxyz";
String result = "";
for (int i = 0; i < s.length(); i++) {
if (good.indexOf(s.charAt(i)) >= 0) {
result += s.charAt(i);
}
}
return result;
}
Can anyone tell me a better way to achieve the same. Probably regex may be better option.
Regards
Harry
$ means "Match the end of the string" (the position after the last character in the string). Both are called anchors and ensure that the entire string is matched instead of just a substring.
In regex, the uppercase metacharacter denotes the inverse of the lowercase counterpart, for example, \w for word character and \W for non-word character; \d for digit and \D or non-digit.
Here you go:
result = result.replaceAll("[^a-zA-Z0-9]", "");
But if you understand your code and it's readable then maybe you have the best solution:
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.
The following should be faster than anything using regex, and your initial attempt.
public String stripGarbage(String s) {
StringBuilder sb = new StringBuilder(s.length());
for (int i = 0; i < s.length(); i++) {
char ch = s.charAt(i);
if ((ch >= 'A' && ch <= 'Z') ||
(ch >= 'a' && ch <= 'z') ||
(ch >= '0' && ch <= '9')) {
sb.append(ch);
}
}
return sb.toString();
}
Key points:
It is significantly faster use a StringBuilder than string concatenation in a loop. (The latter generates N - 1
garbage strings and copies N * (N + 1) / 2
characters to build a String containing N
characters.)
If you have a good estimate of the length of the result String, it is a good idea to preallocate the StringBuilder to hold that number of characters. (But if you don't have a good estimate, the cost of the internal reallocations etc amortizes to O(N)
where N
is the final string length ... so this is not normally a major concern.)
Searching testing a character against (up to) 3 character ranges will be significantly faster on average than searching for a character in a 62 character String.
A switch statement might be faster especially if there are more character ranges. However, in this case it will take many more lines of code to list the cases for all of the letters and digits.
If the non-garbage characters match existing predicates of the Character
class (e.g. Character.isLetter(char)
etc) you could use those. This would be a good option if you wanted to match any letter or digit ... rather than just ASCII letters and digits.
Other alternatives to consider are using a HashSet<Character>
or a boolean[]
indexed by character that were pre-populated with the non-garbage characters. These approaches work well if the set of non-garbage characters is not known at compile time.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With