I have this input String (containg tabs, spaces, linebreaks):
That is a test.
seems to work pretty good? working.
Another test again.
[Edit]: I should have provided the String for better testing as stackoverflow removes all special characters (tabs, ...)
String testContent = "\n\t\n\t\t\t\n\t\t\tDas ist ein Test.\t\t\t \n\tsoweit scheint das \t\tganze zu? funktionieren.\n\n\n\n\t\t\n\t\t\n\t\t\t \n\t\t\t \n \t\t\t\n \tNoch ein Test.\n \t\n \t\n \t";
And I want to reach this state:
That is a test.
seems to work pretty good? working.
Another test again.
String expectedOutput = "Das ist ein Test.\nsoweit scheint das ganze zu? funktionieren.\nNoch ein Test.\n";
Any ideas? Can this be achieved using regexes?
replaceAll("\\s+", " ")
is NOT what I'm looking for. If this regex would preserve exactly 1 newline of the ones existing it would be perfect.
I have tried this but this seems suboptimal to me...:
BufferedReader bufReader = new BufferedReader(new StringReader(testContent));
String line = null;
StringBuilder newString = new StringBuilder();
while ((line = bufReader.readLine()) != null) {
String temp = line.replaceAll("\\s+", " ");
if (!temp.trim().equals("")) {
newString.append(temp.trim());
newString.append("\n");
}
}
In a single regex (plus a small patch for tabs):
input.replaceAll("^\\s+|\\s+$|\\s*(\n)\\s*|(\\s)\\s*", "$1$2")
.replace("\t"," ");
The regex looks daunting, but in fact decomposes nicely into these parts that are OR-ed together:
^\s+
– match whitespace at the beginning;\s+$
– match whitespace at the end;\s*(\n)\s*
– match whitespace containing a newline, and capture that newline;(\s)\s*
– match whitespace, capturing the first whitespace character.The result will be a match with two capture groups, but only one of the groups may be non-empty at a time. This allows me to replace the match with "$1$2"
, which means "concatenate the two capture groups."
The only remaining problem is that I can't replace a tab with a space using this approach, so I fix that up with a simple non-regex character replacement.
In 4 steps:
text
// 1. compress all non-newline whitespaces to single space
.replaceAll("[\\s&&[^\\n]]+", " ")
// 2. remove spaces from begining or end of lines
.replaceAll("(?m)^\\s|\\s$", "")
// 3. compress multiple newlines to single newlines
.replaceAll("\\n+", "\n")
// 4. remove newlines from begining or end of string
.replaceAll("^\n|\n$", "")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With