Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Implementing Text Justification with Dynamic Programming

I'm trying to understand the concept of Dynamic Programming, via the course on MIT OCW here. The explanation on OCW video is great and all, but I feel like I don't really understand it until I implemented the explanation into code. While implementiing, I refer to some notes from the lecture note here, particularly page 3 of the note.

The problem is, I have no idea how to translate some of the mathematical notation to code. Here's some part of the solution I've implemented (and think it's implemented right):

import math  paragraph = "Some long lorem ipsum text." words = paragraph.split(" ")  # Count total length for all strings in a list of strings. # This function will be used by the badness function below. def total_length(str_arr):     total = 0      for string in str_arr:         total = total + len(string)      total = total + len(str_arr) # spaces     return total  # Calculate the badness score for a word. # str_arr is assumed be send as word[i:j] as in the notes # we don't make i and j as argument since it will require # global vars then. def badness(str_arr, page_width):     line_len = total_length(str_arr)     if line_len > page_width:         return float('nan')      else:         return math.pow(page_width - line_len, 3) 

Now the part I don't understand is on point 3 to 5 in the lecture notes. I literally don't understand and don't know where to start implementing those. So far, I've tried iterating the list of words, and counting the badness of each allegedly end of line, like this:

def justifier(str_arr, page_width):     paragraph = str_arr     par_len = len(paragraph)     result = [] # stores each line as list of strings     for i in range(0, par_len):         if i == (par_len - 1):             result.append(paragraph)         else:             dag = [badness(paragraph[i:j], page_width) + justifier(paragraph[j:], page_width) for j in range(i + 1, par_len + 1)]              # Should I do a min(dag), get the index, and declares it as end of line? 

But then, I don't know how I can continue the function, and to be honest, I don't understand this line:

dag = [badness(paragraph[i:j], page_width) + justifier(paragraph[j:], page_width) for j in range(i + 1, par_len + 1)]  

and how I'll return justifier as an int (since I already decided to store return value in result, which is a list. Should I make another function and recurse from there? Should there be any recursion at all?

Could you please show me what to do next, and explain how this is dynamic programming? I really can't see where the recursion is, and what the subproblem is.

Thanks before.

like image 752
bertzzie Avatar asked Aug 13 '13 04:08

bertzzie


People also ask

What are the steps of dynamic programming approach?

There are three steps in finding a dynamic programming solution to a problem: (i) Define a class of subproblems, (ii) give a recurrence based on solving each subproblem in terms of simpler subproblems, and (iii) give an algorithm for computing the recurrence.


2 Answers

In case you have trouble understanding the core idea of dynamic programming itself here is my take on it:

Dynamic programming is essentially sacrificing space complexity for time complexity (but the extra space you use is usually very little compared to the time you save, making dynamic programming totally worth it if implemented correctly). You store the values from each recursive call as you go (e.g. in an array or a dictionary) so you can avoid computing for the second time when you run into the same recursive call in another branch of the recursion tree.

And no you do not have to use recursion. Here is my implementation of the question you were working on using just loops. I followed the TextAlignment.pdf linked by AlexSilva very closely. Hopefully you find this helpful.

def length(wordLengths, i, j):     return sum(wordLengths[i- 1:j]) + j - i + 1   def breakLine(text, L):     # wl = lengths of words     wl = [len(word) for word in text.split()]      # n = number of words in the text     n = len(wl)          # total badness of a text l1 ... li     m = dict()     # initialization     m[0] = 0          # auxiliary array     s = dict()      # the actual algorithm     for i in range(1, n + 1):         sums = dict()         k = i         while (length(wl, k, i) <= L and k > 0):             sums[(L - length(wl, k, i))**3 + m[k - 1]] = k             k -= 1         m[i] = min(sums)         s[i] = sums[min(sums)]      # actually do the splitting by working backwords     line = 1     while n > 1:         print("line " + str(line) + ": " + str(s[n]) + "->" + str(n))         n = s[n] - 1         line += 1 
like image 50
Joohwan Avatar answered Sep 21 '22 14:09

Joohwan


For anyone else still interested in this: The key is to move backwards from the end of the text (as mentioned here). If you do so, you just compare already memorized elements.

Say, words is a list of strings to be wrapped according to textwidth. Then, in the notation of the lecture, the task reduces to three lines of code:

import numpy as np  textwidth = 80  DP = [0]*(len(words)+1)  for i in range(len(words)-1,-1,-1):     DP[i] = np.min([DP[j] + badness(words[i:j],textwidth) for j in range(i+1,len(words)+1)]) 

With:

def badness(line,textwidth):      # Number of gaps     length_line = len(line) - 1      for word in line:         length_line += len(word)      if length_line > textwidth: return float('inf')      return ( textwidth - length_line )**3 

He mentions that one can add a second list to keep track of the breaking positions. You can do so by altering to code to:

DP = [0]*(len(words)+1) breaks = [0]*(len(words)+1)  for i in range(len(words)-1,-1,-1):     temp = [DP[j] + badness(words[i:j],args.textwidth) for j in range(i+1,len(words)+1)]      index = np.argmin(temp)      # Index plus position in upper list     breaks[i] = index + i + 1     DP[i] = temp[index] 

To recover the text, just use the list of breaking positions:

def reconstruct_text(words,breaks):                                                                                                                      lines = []     linebreaks = []      i = 0      while True:          linebreaks.append(breaks[i])         i = breaks[i]          if i == len(words):             linebreaks.append(0)             break      for i in range( len(linebreaks) ):         lines.append( ' '.join( words[ linebreaks[i-1] : linebreaks[i] ] ).strip() )      return lines 

Result: (text = reconstruct_text(words,breaks))

Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. 

One might be tempted to add some whitespaces. This is pretty tricky (since one might come up with various aesthetic rules) but a naive try might be:

import re  def spacing(text,textwidth,maxspace=4):      for i in range(len(text)):          length_line = len(text[i])          if length_line < textwidth:              status_length = length_line             whitespaces_remain = textwidth - status_length             Nwhitespaces = text[i].count(' ')              # If whitespaces (to add) per whitespace exeeds             # maxspace, don't do anything.             if whitespaces_remain/Nwhitespaces > maxspace-1:pass             else:                 text[i] = text[i].replace(' ',' '*( 1 + int(whitespaces_remain/Nwhitespaces)) )                 status_length = len(text[i])                  # Periods have highest priority for whitespace insertion                 periods = text[i].split('.')                  # Can we add a whitespace behind each period?                 if len(periods) - 1 + status_length <= textwidth:                     text[i] = '. '.join(periods).strip()                  status_length = len(text[i])                 whitespaces_remain = textwidth - status_length                 Nwords = len(text[i].split())                 Ngaps = Nwords - 1                  if whitespaces_remain != 0:factor = Ngaps / whitespaces_remain                  # List of whitespaces in line i                 gaps = re.findall('\s+', text[i])                  temp = text[i].split()                 for k in range(Ngaps):                     temp[k] = ''.join([temp[k],gaps[k]])                  for j in range(whitespaces_remain):                     if status_length >= textwidth:pass                     else:                         replace = temp[int(factor*j)]                         replace = ''.join([replace, " "])                         temp[int(factor*j)] = replace                  text[i] = ''.join(temp)      return text 

What gives you: (text = spacing(text,textwidth))

Lorem  ipsum  dolor  sit  amet, consetetur  sadipscing  elitr,  sed  diam nonumy eirmod  tempor  invidunt  ut labore  et  dolore  magna aliquyam  erat,  sed diam voluptua.   At  vero eos  et accusam  et justo  duo dolores  et ea  rebum.  Stet clita  kasd  gubergren,  no  sea  takimata sanctus  est  Lorem  ipsum  dolor sit amet.   Lorem  ipsum  dolor  sit amet,  consetetur  sadipscing  elitr,  sed diam nonumy  eirmod  tempor invidunt  ut labore  et dolore  magna aliquyam  erat, sed diam  voluptua.  At vero eos et accusam et  justo duo dolores et ea rebum.  Stet clita  kasd gubergren, no sea  takimata sanctus est Lorem  ipsum dolor sit amet. 
like image 27
Suuuehgi Avatar answered Sep 21 '22 14:09

Suuuehgi