Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Break a long string into lines with proper word wrapping

Tags:

java

regex

 String original = "This is a sentence.Rajesh want to test the application for the word split.";
 List matchList = new ArrayList();
 Pattern regex = Pattern.compile(".{1,10}(?:\\s|$)", Pattern.DOTALL);
 Matcher regexMatcher = regex.matcher(original);
 while (regexMatcher.find()) {
     matchList.add(regexMatcher.group());
 }
 System.out.println("Match List "+matchList);

I need to parse text into an array of lines that do not exceed 10 characters in length and should not have a break in word at the end of the line.

I used below logic in my scenario but the problem it is parsing to the nearest white space after 10 characters if there is a break at end of line

for eg: The actual sentence is "This is a sentence.Rajesh want to test the application for the word split." But after logic execution its getting as below.

Match List [This is a , nce.Rajesh , want to , test the , pplication , for the , word , split.]

like image 796
Raja Avatar asked Jan 26 '26 08:01

Raja


2 Answers

OK, so I've managed to get the following working, with max line length of 10, but also splitting the words that are longer than 10 correctly!

String original = "This is a sentence. Rajesh want to test the applications for the word split handling.";
List matchList = new ArrayList();
Pattern regex = Pattern.compile("(.{1,10}(?:\\s|$))|(.{0,10})", Pattern.DOTALL);
Matcher regexMatcher = regex.matcher(original);
while (regexMatcher.find()) {
  matchList.add(regexMatcher.group());
}
System.out.println("Match List "+matchList);

This is the result:

This is a 
sentence. 
Rajesh want 
to test 
the 
applicatio
ns word 
split 
handling.
like image 193
Rafe Avatar answered Jan 28 '26 20:01

Rafe


This question was tagged as Groovy at some point. Assuming a Groovy answer is still valid and you are not worried about preserving multiple white spaces (e.g. ' '):

def splitIntoLines(text, maxLineSize) {
    def words = text.split(/\s+/)
    def lines = ['']
    words.each { word ->
        def lastLine = (lines[-1] + ' ' + word).trim()
        if (lastLine.size() <= maxLineSize)
            // Change last line.
            lines[-1] = lastLine
        else
            // Add word as new line.
            lines << word
    }
    lines
}

// Tests...
def original = "This is a sentence. Rajesh want to test the application for the word split."

assert splitIntoLines(original, 10) == [
    "This is a",
    "sentence.",
    "Rajesh",
    "want to",
    "test the",
    "application",
    "for the",
    "word",
    "split."
]
assert splitIntoLines(original, 20) == [
    "This is a sentence.",
    "Rajesh want to test",
    "the application for",
    "the word split."
]
assert splitIntoLines(original, original.size()) == [original]
like image 29
epidemian Avatar answered Jan 28 '26 22:01

epidemian