Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Word wrap to X lines instead of maximum width (Least raggedness)

Does anyone know a good algorithm to word wrap an input string to a specified number of lines rather than a set width. Basically to achieve the minimum width for X lines.

e.g. "I would like to be wrapped into two lines"
goes to
"I would like to be
wrapped into two lines"

"I would like to be wrapped into three lines"
goes to
"I would like to
be wrapped into
three lines"

Inserting new lines as required. I can find other word wrap questions but they all have a known width and want to insert as many lines as needed to fit that width. I am after the opposite.

Answers preferable in a .NET language but any language would be helpful. Obviously if there is a framework way to do this I am not aware of let me know.

Edit I have found this since which I think the accepted answer is the solution to my problem but am having difficulty understanding it. Algorithm to divide text into 3 evenly-sized groups any chance someone could convert it to c# or vb.net.

like image 407
PeteT Avatar asked Jun 21 '11 13:06

PeteT


3 Answers

A way of solvng this problem would be using dynamic programming, You can solve this problem using dynamic programming, cf Minimum raggedness algorithm. I used some of the informations you add when you eddited your post with : Algorithm to divide text into 3 evenly-sized groups


Notations:

Let name your text document="word1 word2 .... wordp"

n= number of line required

LineWidth=len(document)/n


Cost function:

First you need to define a cost function of having word[i] to word[j] in the same line , you can take the same as the one as the one on wikipedia, with p=2 for example:

cost function

It represent the distance between the objective length of a line and the actual lenght.

The total cost function for the optimal solution can be defined with the following recursiion relation:

enter image description here


Solving the problem:

You can solve this problem using dynamic programming. I took the code from the link you gave, and changed it a so you see what the program is using.

  1. At stage k you add words to line k.
  2. Then you look at the optimal cost of having word i to j at line k.
  3. Once you've gone from line 1 to n, you tacke the smallest cost in the last step and you have your optimal result:

Here is the result from the code:

D=minragged('Just testing to see how this works.')

number of words: 7
------------------------------------
stage : 0
------------------------------------
word i to j in line 0       TotalCost (f(j))
------------------------------------
i= 0 j= 0           121.0
i= 0 j= 1           49.0
i= 0 j= 2           1.0
i= 0 j= 3           16.0
i= 0 j= 4           64.0
i= 0 j= 5           144.0
i= 0 j= 6           289.0
i= 0 j= 7           576.0
------------------------------------
stage : 1
------------------------------------
word i to j in line 1       TotalCost (f(j))
------------------------------------
i= 0 j= 0           242.0
i= 0 j= 1           170.0
i= 0 j= 2           122.0
i= 0 j= 3           137.0
i= 0 j= 4           185.0
i= 0 j= 5           265.0
i= 0 j= 6           410.0
i= 0 j= 7           697.0
i= 1 j= 2           65.0
i= 1 j= 3           50.0
i= 1 j= 4           58.0
i= 1 j= 5           98.0
i= 1 j= 6           193.0
i= 1 j= 7           410.0
i= 2 j= 4           26.0
i= 2 j= 5           2.0
i= 2 j= 6           17.0
i= 2 j= 7           122.0
i= 3 j= 7           80.0
------------------------------------
stage : 2
------------------------------------
word i to j in line 2       TotalCost (f(j))
------------------------------------
i= 0 j= 7           818.0
i= 1 j= 7           531.0
i= 2 j= 7           186.0
i= 3 j= 7           114.0
i= 4 j= 7           42.0
i= 5 j= 7           2.0
reversing list
------------------------------------
Just testing        12
to see how      10
this works.         11
  • *There fore the best choice is to have words 5 to 7 in last line.(cf stage2)
  • then words 2 to 5 in second line (cf stage1)
  • then words 0 to 2 in first line (cf stage 0).*

Reverse this and you get:

Just testing          12
to see how          10
this works.          11

Here is the code to print the reasonning,(in python sorry I don't use C#...but I someone actually translated the code in C#) :

def minragged(text, n=3):


    P=2
    words = text.split()
    cumwordwidth = [0]
    # cumwordwidth[-1] is the last element
    for word in words:
        cumwordwidth.append(cumwordwidth[-1] + len(word))
    totalwidth = cumwordwidth[-1] + len(words) - 1  # len(words) - 1 spaces
    linewidth = float(totalwidth - (n - 1)) / float(n)  # n - 1 line breaks

    print "number of words:", len(words)
    def cost(i, j):
        """
        cost of a line words[i], ..., words[j - 1] (words[i:j])
        """
        actuallinewidth = max(j - i - 1, 0) + (cumwordwidth[j] - cumwordwidth[i])
        return (linewidth - float(actuallinewidth)) ** P

    """
    printing the reasoning and reversing the return list
    """
    F={} # Total cost function

    for stage in range(n):
        print "------------------------------------"
        print "stage :",stage
        print "------------------------------------"
        print "word i to j in line",stage,"\t\tTotalCost (f(j))"
        print "------------------------------------"


        if stage==0:
            F[stage]=[]
            i=0
            for j in range(i,len(words)+1):
                print "i=",i,"j=",j,"\t\t\t",cost(i,j)
                F[stage].append([cost(i,j),0])
        elif stage==(n-1):
            F[stage]=[[float('inf'),0] for i in range(len(words)+1)]
            for i in range(len(words)+1):
                    j=len(words)
                    if F[stage-1][i][0]+cost(i,j)<F[stage][j][0]: #calculating min cost (cf f formula)
                        F[stage][j][0]=F[stage-1][i][0]+cost(i,j)
                        F[stage][j][1]=i
                        print "i=",i,"j=",j,"\t\t\t",F[stage][j][0]            
        else:
            F[stage]=[[float('inf'),0] for i in range(len(words)+1)]
            for i in range(len(words)+1):
                for j in range(i,len(words)+1):
                    if F[stage-1][i][0]+cost(i,j)<F[stage][j][0]:
                        F[stage][j][0]=F[stage-1][i][0]+cost(i,j)
                        F[stage][j][1]=i
                        print "i=",i,"j=",j,"\t\t\t",F[stage][j][0]

    print 'reversing list'
    print "------------------------------------"
    listWords=[]
    a=len(words)
    for k in xrange(n-1,0,-1):#reverse loop from n-1 to 1
        listWords.append(' '.join(words[F[k][a][1]:a]))
        a=F[k][a][1]
    listWords.append(' '.join(words[0:a]))
    listWords.reverse()

    for line in listWords:
        print line, '\t\t',len(line)

    return listWords
like image 77
Ricky Bobby Avatar answered Jun 20 '23 14:06

Ricky Bobby


Here is the accepted solution from Algorithm to divide text into 3 evenly-sized groups converted to C#:

static List<string> Minragged(string text, int n = 3)
{
    var words = text.Split();

    var cumwordwidth = new List<int>();
    cumwordwidth.Add(0);

    foreach (var word in words)
        cumwordwidth.Add(cumwordwidth[cumwordwidth.Count - 1] + word.Length);

    var totalwidth = cumwordwidth[cumwordwidth.Count - 1] + words.Length - 1;

    var linewidth = (double)(totalwidth - (n - 1)) / n;

    var cost = new Func<int, int, double>((i, j) =>
    {
        var actuallinewidth = Math.Max(j - i - 1, 0) + (cumwordwidth[j] - cumwordwidth[i]);
        return (linewidth - actuallinewidth) * (linewidth - actuallinewidth);
    });

    var best = new List<List<Tuple<double, int>>>();

    var tmp = new List<Tuple<double, int>>();
    best.Add(tmp);
    tmp.Add(new Tuple<double, int>(0.0f, -1));
    foreach (var word in words)
        tmp.Add(new Tuple<double, int>(double.MaxValue, -1));

    for (int l = 1; l < n + 1; ++l)
    {
        tmp = new List<Tuple<double, int>>();
        best.Add(tmp);
        for (int j = 0; j < words.Length + 1; ++j)
        {
            var min = new Tuple<double, int>(best[l - 1][0].Item1 + cost(0, j), 0);
            for (int k = 0; k < j + 1; ++k)
            {
                var loc = best[l - 1][k].Item1 + cost(k, j);
                if (loc < min.Item1 || (loc == min.Item1 && k < min.Item2))
                    min = new Tuple<double, int>(loc, k);
            }
            tmp.Add(min);
        }
    }

    var lines = new List<string>();
    var b = words.Length;

    for (int l = n; l > 0; --l)
    {
        var a = best[l][b].Item2;
        lines.Add(string.Join(" ", words, a, b - a));
        b = a;
    }

    lines.Reverse();
    return lines;
}
like image 27
Petar Ivanov Avatar answered Jun 20 '23 13:06

Petar Ivanov


There was a discussion about this exact problem (though it was phrased in a different way) at http://www.perlmonks.org/?node_id=180276.

In the end the best solution was to do a binary search through all possible widths to find the smallest width that wound up with no more than the desired number of columns. If there are n items and the average width is m, then you'll need O(log(n) + log(m)) passes to find the right width, each of which takes O(n) time, for O(n * (log(n) + log(m))). This is probably fast enough with no more need to be clever.

If you wish to be clever, you can create an array of word counts, and cumulative lengths of the words. Then use binary searches on this data structure to figure out where the line breaks are. Creating this data structure is O(n), and it makes all of the passes to figure out the right width be O(log(n) * (log(n) + log(m))) which for reasonable lengths of words is dominated by your first O(n) pass.

If the widths of words can be floating point, you'll need to do something more clever with the binary searches, but you are unlikely to need that particular optimization.

like image 39
btilly Avatar answered Jun 20 '23 14:06

btilly