Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Balanced word wrap (Minimum raggedness) in PHP

I'm going to make a word wrap algorithm in PHP. I want to split small chunks of text (short phrases) in n lines of maximum m characters (n is not given, so there will be as much lines as needed). The peculiarity is that lines length (in characters) has to be much balanced as possible across lines.

Example of input text:

How to do things

Wrong output (this is the normal word-wrap behavior), m=6:

How to
do
things

Desired output, always m=6:

How 
to do 
things

Does anyone have suggestions or guidelines on how to implement this function? Basically, I'm searching something for pretty print short phrases on two or three (as much as possible) equal length lines.


Update: It seems I'm searching exactly for a Minimum raggedness word wrap algorithm. But I can't find any implementation in a real programming language (anyone, then I can convert it in PHP).


Update 2: I started a bounty for this. Is it possible that do not exist any public implementation of Minimum raggedness algorithm in any procedural language? I need something written in a way that can be translated into procedural instructions. All I can find now is just a bounch of (generic) equation that however need a optimal searching procedure. I will be grateful also for an implementation that can only approximate that optimal searching algorithm.

like image 977
lorenzo-s Avatar asked Jan 30 '12 21:01

lorenzo-s


2 Answers

I've implemented on the same lines of Alex, coding the Wikipedia algorithm, but directly in PHP (an interesting exercise to me). Understanding how to use the optimal cost function f(j), i.e. the 'recurrence' part, is not very easy. Thanks to Alex for the well commented code.

/** 
 * minimumRaggedness
 *
 * @param string $input paragraph. Each word separed by 1 space.
 * @param int $LineWidth the max chars per line.
 * @param string $lineBreak wrapped lines separator.
 * 
 * @return string $output the paragraph wrapped.
 */
function minimumRaggedness($input, $LineWidth, $lineBreak = "\n")
{
    $words = explode(" ", $input);
    $wsnum = count($words);
    $wslen = array_map("strlen", $words);
    $inf = 1000000; //PHP_INT_MAX;

    // keep Costs
    $C = array();

    for ($i = 0; $i < $wsnum; ++$i)
    {
        $C[] = array();
        for ($j = $i; $j < $wsnum; ++$j)
        {
            $l = 0;
            for ($k = $i; $k <= $j; ++$k)
                $l += $wslen[$k];
            $c = $LineWidth - ($j - $i) - $l;
            if ($c < 0)
                $c = $inf;
            else
                $c = $c * $c;
            $C[$i][$j] = $c;
        }
    }

    // apply recurrence
    $F = array();
    $W = array();
    for ($j = 0; $j < $wsnum; ++$j)
    {
        $F[$j] = $C[0][$j];
        $W[$j] = 0;
        if ($F[$j] == $inf)
        {
            for ($k = 0; $k < $j; ++$k)
            {
                $t = $F[$k] + $C[$k + 1][$j];
                if ($t < $F[$j])
                {
                    $F[$j] = $t;
                    $W[$j] = $k + 1;
                }
            }
        }
    }

    // rebuild wrapped paragraph
    $output = "";
    if ($F[$wsnum - 1] < $inf)
    {
        $S = array();
        $j = $wsnum - 1;
        for ( ; ; )
        {
            $S[] = $j;
            $S[] = $W[$j];
            if ($W[$j] == 0)
                break;
            $j = $W[$j] - 1;
        }

        $pS = count($S) - 1;
        do
        {
            $i = $S[$pS--];
            $j = $S[$pS--];
            for ($k = $i; $k < $j; $k++)
                $output .= $words[$k] . " ";
            $output .= $words[$k] . $lineBreak;
        }
        while ($j < $wsnum - 1);
    }
    else
        $output = $input;

    return $output;
}

?>

like image 180
CapelliC Avatar answered Sep 19 '22 20:09

CapelliC


Quick and dirty, in c++

#include <sstream>
#include <iostream>
#include <vector>
#include <cstdlib>
#include <memory.h>

using namespace std;

int cac[1000][1000];
string res[1000][1000];
vector<string> words;
int M;

int go(int a, int b){
    if(cac[a][b]>= 0) return cac[a][b];
    if(a == b) return 0;

    int csum = -1;
    for(int i=a; i<b; ++i){
    csum += words[i].size() + 1;
    }
    if(csum <= M || a == b-1){
    string sep = "";
        for(int i=a; i<b; ++i){
            res[a][b].append(sep);
            res[a][b].append(words[i]);
            sep = " ";
    }
    return cac[a][b] = (M-csum)*(M-csum);
    }

    int ret = 1000000000;
    int best_sp = -1;
    for(int sp=a+1; sp<b; ++sp){
    int cur = go(a, sp) + go(sp,b);
    if(cur <= ret){
        ret = cur;
        best_sp = sp;
    }
    }
    res[a][b] = res[a][best_sp] + "\n" + res[best_sp][b];
    return cac[a][b] = ret;
}


int main(int argc, char ** argv){
    memset(cac, -1, sizeof(cac));
    M = atoi(argv[1]);
    string word;
    while(cin >> word) words.push_back(word);
    go(0, words.size());
    cout << res[0][words.size()] << endl;
}

Test:

$ echo "The quick brown fox jumps over a lazy dog" |./a.out 10
The quick
brown fox
jumps over
a lazy dog

EDIT: just looked at the wikipedia page for minimum raggedness word wrap. Changed algorithm to the given one (with squared penalties)

like image 41
maniek Avatar answered Sep 20 '22 20:09

maniek