Implementing Text Justification with Dynamic Programming

Tags:

I'm trying to understand the concept of Dynamic Programming, via the course on MIT OCW here. The explanation on OCW video is great and all, but I feel like I don't really understand it until I implemented the explanation into code. While implementiing, I refer to some notes from the lecture note here, particularly page 3 of the note.

The problem is, I have no idea how to translate some of the mathematical notation to code. Here's some part of the solution I've implemented (and think it's implemented right):

import math  paragraph = "Some long lorem ipsum text." words = paragraph.split(" ")  # Count total length for all strings in a list of strings. # This function will be used by the badness function below. def total_length(str_arr):     total = 0      for string in str_arr:         total = total + len(string)      total = total + len(str_arr) # spaces     return total  # Calculate the badness score for a word. # str_arr is assumed be send as word[i:j] as in the notes # we don't make i and j as argument since it will require # global vars then. def badness(str_arr, page_width):     line_len = total_length(str_arr)     if line_len > page_width:         return float('nan')      else:         return math.pow(page_width - line_len, 3)

Now the part I don't understand is on point 3 to 5 in the lecture notes. I literally don't understand and don't know where to start implementing those. So far, I've tried iterating the list of words, and counting the badness of each allegedly end of line, like this:

def justifier(str_arr, page_width):     paragraph = str_arr     par_len = len(paragraph)     result = [] # stores each line as list of strings     for i in range(0, par_len):         if i == (par_len - 1):             result.append(paragraph)         else:             dag = [badness(paragraph[i:j], page_width) + justifier(paragraph[j:], page_width) for j in range(i + 1, par_len + 1)]              # Should I do a min(dag), get the index, and declares it as end of line?

But then, I don't know how I can continue the function, and to be honest, I don't understand this line:

dag = [badness(paragraph[i:j], page_width) + justifier(paragraph[j:], page_width) for j in range(i + 1, par_len + 1)]

and how I'll return justifier as an int (since I already decided to store return value in result, which is a list. Should I make another function and recurse from there? Should there be any recursion at all?

Could you please show me what to do next, and explain how this is dynamic programming? I really can't see where the recursion is, and what the subproblem is.

Thanks before.

752

asked Aug 13 '13 04:08

bertzzie

2 Answers

In case you have trouble understanding the core idea of dynamic programming itself here is my take on it:

Dynamic programming is essentially sacrificing space complexity for time complexity (but the extra space you use is usually very little compared to the time you save, making dynamic programming totally worth it if implemented correctly). You store the values from each recursive call as you go (e.g. in an array or a dictionary) so you can avoid computing for the second time when you run into the same recursive call in another branch of the recursion tree.

And no you do not have to use recursion. Here is my implementation of the question you were working on using just loops. I followed the TextAlignment.pdf linked by AlexSilva very closely. Hopefully you find this helpful.

def length(wordLengths, i, j):     return sum(wordLengths[i- 1:j]) + j - i + 1   def breakLine(text, L):     # wl = lengths of words     wl = [len(word) for word in text.split()]      # n = number of words in the text     n = len(wl)          # total badness of a text l1 ... li     m = dict()     # initialization     m[0] = 0          # auxiliary array     s = dict()      # the actual algorithm     for i in range(1, n + 1):         sums = dict()         k = i         while (length(wl, k, i) <= L and k > 0):             sums[(L - length(wl, k, i))**3 + m[k - 1]] = k             k -= 1         m[i] = min(sums)         s[i] = sums[min(sums)]      # actually do the splitting by working backwords     line = 1     while n > 1:         print("line " + str(line) + ": " + str(s[n]) + "->" + str(n))         n = s[n] - 1         line += 1

answered Sep 21 '22 14:09

Joohwan

For anyone else still interested in this: The key is to move backwards from the end of the text (as mentioned here). If you do so, you just compare already memorized elements.

Say, words is a list of strings to be wrapped according to textwidth. Then, in the notation of the lecture, the task reduces to three lines of code:

import numpy as np  textwidth = 80  DP = [0]*(len(words)+1)  for i in range(len(words)-1,-1,-1):     DP[i] = np.min([DP[j] + badness(words[i:j],textwidth) for j in range(i+1,len(words)+1)])

With:

def badness(line,textwidth):      # Number of gaps     length_line = len(line) - 1      for word in line:         length_line += len(word)      if length_line > textwidth: return float('inf')      return ( textwidth - length_line )**3

He mentions that one can add a second list to keep track of the breaking positions. You can do so by altering to code to:

DP = [0]*(len(words)+1) breaks = [0]*(len(words)+1)  for i in range(len(words)-1,-1,-1):     temp = [DP[j] + badness(words[i:j],args.textwidth) for j in range(i+1,len(words)+1)]      index = np.argmin(temp)      # Index plus position in upper list     breaks[i] = index + i + 1     DP[i] = temp[index]

To recover the text, just use the list of breaking positions:

def reconstruct_text(words,breaks):                                                                                                                      lines = []     linebreaks = []      i = 0      while True:          linebreaks.append(breaks[i])         i = breaks[i]          if i == len(words):             linebreaks.append(0)             break      for i in range( len(linebreaks) ):         lines.append( ' '.join( words[ linebreaks[i-1] : linebreaks[i] ] ).strip() )      return lines

Result: (text = reconstruct_text(words,breaks))

Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.

One might be tempted to add some whitespaces. This is pretty tricky (since one might come up with various aesthetic rules) but a naive try might be:

import re  def spacing(text,textwidth,maxspace=4):      for i in range(len(text)):          length_line = len(text[i])          if length_line < textwidth:              status_length = length_line             whitespaces_remain = textwidth - status_length             Nwhitespaces = text[i].count(' ')              # If whitespaces (to add) per whitespace exeeds             # maxspace, don't do anything.             if whitespaces_remain/Nwhitespaces > maxspace-1:pass             else:                 text[i] = text[i].replace(' ',' '*( 1 + int(whitespaces_remain/Nwhitespaces)) )                 status_length = len(text[i])                  # Periods have highest priority for whitespace insertion                 periods = text[i].split('.')                  # Can we add a whitespace behind each period?                 if len(periods) - 1 + status_length <= textwidth:                     text[i] = '. '.join(periods).strip()                  status_length = len(text[i])                 whitespaces_remain = textwidth - status_length                 Nwords = len(text[i].split())                 Ngaps = Nwords - 1                  if whitespaces_remain != 0:factor = Ngaps / whitespaces_remain                  # List of whitespaces in line i                 gaps = re.findall('\s+', text[i])                  temp = text[i].split()                 for k in range(Ngaps):                     temp[k] = ''.join([temp[k],gaps[k]])                  for j in range(whitespaces_remain):                     if status_length >= textwidth:pass                     else:                         replace = temp[int(factor*j)]                         replace = ''.join([replace, " "])                         temp[int(factor*j)] = replace                  text[i] = ''.join(temp)      return text

What gives you: (text = spacing(text,textwidth))

Lorem  ipsum  dolor  sit  amet, consetetur  sadipscing  elitr,  sed  diam nonumy eirmod  tempor  invidunt  ut labore  et  dolore  magna aliquyam  erat,  sed diam voluptua.   At  vero eos  et accusam  et justo  duo dolores  et ea  rebum.  Stet clita  kasd  gubergren,  no  sea  takimata sanctus  est  Lorem  ipsum  dolor sit amet.   Lorem  ipsum  dolor  sit amet,  consetetur  sadipscing  elitr,  sed diam nonumy  eirmod  tempor invidunt  ut labore  et dolore  magna aliquyam  erat, sed diam  voluptua.  At vero eos et accusam et  justo duo dolores et ea rebum.  Stet clita  kasd gubergren, no sea  takimata sanctus est Lorem  ipsum dolor sit amet.

answered Sep 21 '22 14:09

Suuuehgi

Related questions
                            
                                Understanding Freeman chain codes for OCR
                            
                                Optimizing queries for the next and previous element
                            
                                Check connection between two points on 2D plane
                            
                                What is the difference between breadth first searching and level order traversal?
                            
                                How to design an algorithm to calculate countdown style maths number puzzle
                            
                                How to calculate the number of coprime subsets of the set {1,2,3,..,n}
                            
                                check if a tree is a binary search tree
                            
                                What are some algorithms that will allow me to simulate planetary physics?
                            
                                Finding kth-shortest paths?
                            
                                How to generate the power-set of a given List?
                            
                                Algorithm to find the maximum sum in a sequence of overlapping intervals
                            
                                How many possible states does the 8-puzzle have?
                            
                                Inserting an equal value element
                            
                                How do you efficiently generate a list of K non-repeating integers between 0 and an upper bound N [duplicate]
                            
                                Is there any technical reason why std::lower_bound is not specialized for red-black tree iterators?
                            
                                Algorithm to find Lucky Numbers
                            
                                How can you detect if two regular expressions overlap in the strings they can match?
                            
                                what is the meaning of O(1), O(n), O(n*n) memory? [duplicate]
                            
                                Is there any good open-source or freely available Chinese segmentation algorithm available? [closed]
                            
                                How to generate a verification code/number?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Implementing Text Justification with Dynamic Programming

Tags:

algorithm

python-3.x

dynamic-programming

bertzzie

People also ask

2 Answers

Joohwan

Suuuehgi

Recent Activity

Donate For Us