Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dynamic programming exercise for string cutting

I have been working on the following problem from this book.

A certain string-processing language offers a primitive operation which splits a string into two pieces. Since this operation involves copying the original string, it takes n units of time for a string of length n, regardless of the location of the cut. Suppose, now, that you want to break a string into many pieces. The order in which the breaks are made can affect the total running time. For example, if you want to cut a 20-character string at positions 3 and 10, then making the first cut at position 3 incurs a total cost of 20+17=37, while doing position 10 first has a better cost of 20+10=30.

I need a dynamic programming algorithm that given m cuts, finds the minimum cost of cutting a string into m+1 pieces.

like image 808
Mark Avatar asked Apr 09 '12 03:04

Mark


3 Answers

The divide and conquer approach seems to me the best one for this kind of problem. Here is a Java implementation of the algorithm:

Note: the array m should be sorted in ascending order (use Arrays.sort(m);)

public int findMinCutCost(int[] m, int n) {
   int cost = n * m.length;
   for (int i=0; i<m.length; i++) {
      cost = Math.min(findMinCutCostImpl(m, n, i), cost);
   }
   return cost;
}

private int findMinCutCostImpl(int[] m, int n, int i) {
   if (m.length == 1) return n;
   int cl = 0, cr = 0;
   if (i > 0) {
      cl = Integer.MAX_VALUE;
      int[] ml = Arrays.copyOfRange(m, 0, i);
      int nl = m[i];
      for (int j=0; j<ml.length; j++) {
         cl = Math.min(findMinCutCostImpl(ml, nl, j), cl);
      }
   }
   if (i < m.length - 1) {
      cr = Integer.MAX_VALUE;
      int[] mr = Arrays.copyOfRange(m, i + 1, m.length);
      int nr = n - m[i];
      for (int j=0; j<mr.length; j++) {
         mr[j] = mr[j] - m[i];
      }
      for (int j=0; j<mr.length; j++) {
         cr = Math.min(findMinCutCostImpl(mr, nr, j), cr);
      }
   }
   return n + cl + cr;
}

For example :

 int n = 20;
 int[] m = new int[] { 10, 3 };

 System.out.println(findMinCutCost(m, n));

Will print 30

** Edit **

I have implemented two other methods to answer the problem in the question.

1. Median cut approximation

This method cut recursively always the biggest chunks. The results are not always the best solution, but offers a not negligible gain (in the order of +100000% gain from my tests) for a negligible minimal cut loss difference from the best cost.

public int findMinCutCost2(int[] m, int n) {
   if (m.length == 0) return 0;
   if (m.length == 1) return n;
      float half = n/2f;
      int bestIndex = 0;
      for (int i=1; i<m.length; i++) {
         if (Math.abs(half - m[bestIndex]) > Math.abs(half - m[i])) {
            bestIndex = i;
         }
      }
      int cl = 0, cr = 0;
      if (bestIndex > 0) {
         int[] ml = Arrays.copyOfRange(m, 0, bestIndex);
         int nl = m[bestIndex];
         cl = findMinCutCost2(ml, nl);
      }
      if (bestIndex < m.length - 1) {
         int[] mr = Arrays.copyOfRange(m, bestIndex + 1, m.length);
         int nr = n - m[bestIndex];
         for (int j=0; j<mr.length; j++) {
         mr[j] = mr[j] - m[bestIndex];
      }
      cr = findMinCutCost2(mr, nr);
   }
   return n + cl + cr;
}

2. A constant time multi-cut

Instead of calculating the minimal cost, just use different indices and buffers. Since this method executes in a constant time, it always returns n. Plus, the method actually split the string in substrings.

public int findMinCutCost3(int[] m, int n) {
   char[][] charArr = new char[m.length+1][];
   charArr[0] = new char[m[0]];
   for (int i=0, j=0, k=0; j<n; j++) {
      //charArr[i][k++] = string[j];   // string is the actual string to split
      if (i < m.length && j == m[i]) {
         if (++i >= m.length) {
            charArr[i] = new char[n - m[i-1]];
         } else {
            charArr[i] = new char[m[i] - m[i-1]];
         }
         k=0;
      }
   }
   return n;
}

Note: that this last method could easily be modified to accept a String str argument instead of n and set n = str.length(), and return a String[] array from charArr[][].

like image 84
Yanick Rochon Avatar answered Nov 16 '22 12:11

Yanick Rochon


For dynamic programming, I claim that all you really need to know is what the state space should be - how to represent partial problems.

Here we are dividing a string up into m+1 pieces by creating new breaks. I claim that a good state space is a set of (a, b) pairs, where a is the location of the start of a substring and b is the location of the end of the same substring, counted as number of breaks in the final broken down string. The cost associated with each pair is the minimum cost of breaking it up. If b <= a + 1, then the cost is 0, because there are no more breaks to put in. If b is larger, then the possible locations for the next break in that substring are the points a+1, a+2,... b-1. The next break is going to cost b-a regardless of where we put it, but if we put it at position k the minimum cost of later breaks is (a, k) + (k, b).

So to solve this with dynamic programming, build up a table (a, b) of minimum costs, where you can work out the cost of breaks on strings with k sections by considering k - 1 possible breaks and then looking up the costs of strings with at most k - 1 sections.

One way to expand on this would be to start by creating a table T[a, b] and setting all entries in that table to infinity. Then go over the table again and where b <= a+1 put T[a,b] = 0. This fills in entries representing sections of the original string which need no further cuts. Now scan through the table and for each T[a,b] with b > a + 1 consider every possible k such that a < k < b and if min_k ((length between breaks a and b) + T[a,k] + T[k,b]) < T[a,b] set T[a,b] to that minimum value. This recognizes where you now know a way to chop up the substrings represented by T[a,k] and T[k,b] cheaply, so this gives you a better way to chop up T[a,b]. If you now repeat this m times you are done - use a standard dynamic programming backtrack to work out the solution. It might help if you save the best value of k for each T[a,b] in a separate table.

like image 21
mcdowella Avatar answered Nov 16 '22 13:11

mcdowella


python code:

mincost(n, cut_list) =min {   n+ mincost(k,left_cut_list) + min(n-k, right_cut_list) }


import sys

def splitstr(n,cut_list):

        if len(cut_list) == 0: 
            return [0,[]]
        min_positions = []
        min_cost = sys.maxint
        for k in cut_list:
            left_split = [ x for x in cut_list if x < k]
            right_split = [ x-k for x in cut_list if x > k] 
            #print n,k, left_split, right_split
            lcost = splitstr(k,left_split)
            rcost = splitstr(n-k,right_split)      
            cost = n+lcost[0] + rcost[0]
            positions = [k] + lcost[1]+ [x+k for x in rcost[1]] 
            #print "cost:", cost, " min: ", positions
            if cost < min_cost:
                min_cost = cost
                min_positions = positions

        return ( min_cost, min_positions) 



print splitstr(20,[3,10,16])  # (40, [10, 3, 16])

print splitstr(20,[3,10]) # (30, [10, 3])

print splitstr(5,[1,2,3,4,5]) # (13, [2, 1, 3, 4, 5])

print splitstr(1,[1]) # (1, [1]) # m cuts m+1 substrings
like image 2
user1748643 Avatar answered Nov 16 '22 11:11

user1748643