I'm using a framwork which returns malformed Strings with "empty" characters from time to time.
"foobar" for example is represented by: [,f,o,o,b,a,r]
The first character is NOT a whitespace (' '), so a System.out.printlin() would return "foobar" and not " foobar". Yet, the length of the String is 7 instead of 6. Obviously this makes most String methods (equals, split, substring,..) useless. Is there a way to remove empty characters from a String?
I tried to build a new String like this:
StringBuilder sb = new StringBuilder();
for (final char character : malformedString.toCharArray()) {
if (Character.isDefined(character)) {
sb.append(character);
}
}
sb.toString();
Unfortunately this doesn't work. Same with the following code:
StringBuilder sb = new StringBuilder();
for (final Character character : malformedString.toCharArray()) {
if (character != null) {
sb.append(character);
}
}
sb.toString();
I also can't check for an empty character like this:
if (character == ''){
//
}
Obviously there is something wrong with the String .. but I can't change the framework I'm using or wait for them to fix it (if it is a bug within their framework). I need to handle this String and sanatize it.
Any ideas?
The replaceAll() method of the String class replaces each substring of this string that matches the given regular expression with the given replacement. You can remove white spaces from a string by replacing " " with "".
Method #1: Using remove() This particular method is quite naive and not recommended use, but is indeed a method to perform this task. remove() generally removes the first occurrence of an empty string and we keep iterating this process until no empty string is found in list.
gsub() function is used to remove the space by removing the space in the given string.
The easiest way is list comprehension to remove empty elements from a list in Python. And another way is to use the filter() method. The empty string "" contains no characters and empty elements could be None or [ ], etc.
Regex would be an appropriate way to sanitize the string from unwanted Unicode characters in this case.
String sanitized = dirty.replaceAll("[\uFEFF-\uFFFF]", "");
This will replace all char
in \uFEFF-\uFFFF
range with the empty string.
The [...]
construct is called a character class, e.g. [aeiou]
matches one of any of the lowercase vowels, [^aeiou]
matches anything but.
You can do one of these two approaches:
replaceAll("[
blacklist
]", "")
replaceAll("[^
whitelist
]", "")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With