Recently I was asked in an Interview to design an algorithm to convert an input string which is Left Aligned (with spaces at the end of each line) into Justify (with no white space at the end of a complete line), similar to that in MS Word. I proposed him some basic solution which involved counting number of words and number of spaces of each line and then distributing them equally among all spaces (he asked me to assume that fractional space can be distributed between words). But later he asked me to consider the whole paragraph and then modify the text so that the beauty of the text is not lost when unequal distribution of spaces between words is inevitable.
I was unable to think of any proper solution for this at that moment. Later on he told me that this is done by Dynamic Programming. I am not sure if there is already some standard algorithm for this. If yes, please share some useful link.
PS: The solution I proposed was very abstract idea, hence I don't have any code to show what all I have already tried. Justification : http://en.wikipedia.org/wiki/Justification_(typesetting)
The standard algorithm for breaking paragraphs into lines is probably still the algorithm of Knuth & Plass, used by Knuth's typesetting systemTeX
. The algorithm, which 'avoids backtracing by a judicious use of the techniques of dynamic programming ' is described in
Donald E. Knuth and Michael F. Plass, Software - Practice and Experience 11 (1981) 1119-1184 DOI: 10.1002/spe.4380111102, also available in Digital Typography, Ch. 3, pp. 67–155.
The algorithm is based on considering each possible line break, starting from the beginning of the paragraph, and for each one finding the sequence of preceeding line breaks that gives the best result that far. As the entire sequence is determined by the last line break in the sequence, only the potential starting points for the current line has to be considered when a new potential break point is to be added, leading to a efficient algorithm.
A simplified version of the algorithm (without e.g. hyphenation), can be described like this:
Add start of paragraph to list of active breakpoints
For each possible breakpoint (space) B_n, starting from the beginning:
For each breakpoint in active list as B_a:
If B_a is too far away from B_n:
Delete B_a from active list
else
Calculate badness of line from B_a to B_n
Add B_n to active list
If using B_a minimizes cumulative badness from start to B_n:
Record B_a and cumulative badness as best path to B_n
The result is a linked list of breakpoints to use.
The badness of lines under consideration can be calculated like this:
Each space is assigned a nominal width, a strechability, and a shrinkability.
The badness is then calculated as the ratio of stretching or shrinking used,
relative to what is allowed, raised e.g. to the third power (in order to
ensure that several slightly bad lines are prefered over one really bad one)
An illustrated description can be found at http://defoe.sourceforge.net/folio/knuth-plass.html
Implementations in various languages are available on the web, e.g Bram Stein's implementation in Javascript: http://www.bramstein.com/projects/typeset/
This might be an old thread.
But wanted to share the solution anyway in case it helps.
Text Justification Algorithm
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With