Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algorithm to Convert text from Left Align to Justify

Recently I was asked in an Interview to design an algorithm to convert an input string which is Left Aligned (with spaces at the end of each line) into Justify (with no white space at the end of a complete line), similar to that in MS Word. I proposed him some basic solution which involved counting number of words and number of spaces of each line and then distributing them equally among all spaces (he asked me to assume that fractional space can be distributed between words). But later he asked me to consider the whole paragraph and then modify the text so that the beauty of the text is not lost when unequal distribution of spaces between words is inevitable.

I was unable to think of any proper solution for this at that moment. Later on he told me that this is done by Dynamic Programming. I am not sure if there is already some standard algorithm for this. If yes, please share some useful link.

PS: The solution I proposed was very abstract idea, hence I don't have any code to show what all I have already tried. Justification : http://en.wikipedia.org/wiki/Justification_(typesetting)

like image 516
Rajat Shah Avatar asked Aug 06 '13 20:08

Rajat Shah


2 Answers

The standard algorithm for breaking paragraphs into lines is probably still the algorithm of Knuth & Plass, used by Knuth's typesetting systemTeX. The algorithm, which 'avoids backtracing by a judicious use of the techniques of dynamic programming ' is described in

Donald E. Knuth and Michael F. Plass, Software - Practice and Experience 11 (1981) 1119-1184 DOI: 10.1002/spe.4380111102, also available in Digital Typography, Ch. 3, pp. 67–155.

The algorithm is based on considering each possible line break, starting from the beginning of the paragraph, and for each one finding the sequence of preceeding line breaks that gives the best result that far. As the entire sequence is determined by the last line break in the sequence, only the potential starting points for the current line has to be considered when a new potential break point is to be added, leading to a efficient algorithm.

A simplified version of the algorithm (without e.g. hyphenation), can be described like this:

Add start of paragraph to list of active breakpoints
For each possible breakpoint (space) B_n, starting from the beginning:
   For each breakpoint in active list as B_a:
      If B_a is too far away from B_n:
          Delete B_a from active list
      else
          Calculate badness of line from B_a to B_n
          Add B_n to active list
          If using B_a minimizes cumulative badness from start to B_n:
             Record B_a and cumulative badness as best path to B_n

The result is a linked list of breakpoints to use.

The badness of lines under consideration can be calculated like this:

Each space is assigned a nominal width, a strechability, and a shrinkability.
The badness is then calculated as the ratio of stretching or shrinking used,
relative to what is allowed, raised e.g. to the third power (in order to
ensure that several slightly bad lines are prefered over one really bad one)

An illustrated description can be found at http://defoe.sourceforge.net/folio/knuth-plass.html

Implementations in various languages are available on the web, e.g Bram Stein's implementation in Javascript: http://www.bramstein.com/projects/typeset/

like image 190
Terje D. Avatar answered Sep 22 '22 08:09

Terje D.


This might be an old thread.

But wanted to share the solution anyway in case it helps.

Text Justification Algorithm

like image 45
self_noted Avatar answered Sep 22 '22 08:09

self_noted