Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

An interview question - Split text into sub-strings according to rules

Tags:

algorithm

Split text into sub-strings according to below rules:

  • a) The length of each sub-string should less than or equal to M
  • b) The length of sub-string should less than or equal to N (N < M) if the sub-string contains any numeric char
  • c) The total number of sub-strings should be as small as possible

I have no clue how to solve this question, I guess it is related to "dynamic programming". Can anybody help me implement it using C# or Java? Thanks a lot.

like image 560
superche Avatar asked Jul 11 '10 13:07

superche


People also ask

How do you split a string according to?

Using String. split() Method. The split() method of the String class is used to split a string into an array of String objects based on the specified delimiter that matches the regular expression.

How do you split a string into parts based on a delimiter?

You can use the split() method of String class from JDK to split a String based on a delimiter e.g. splitting a comma-separated String on a comma, breaking a pipe-delimited String on a pipe, or splitting a pipe-delimited String on a pipe.

How would you read a string and split it into substrings in angular?

let stringToSplit = "abc def ghi"; let x = stringToSplit. split(" "); console. log(x[0]); The split method returns an array.


2 Answers

Idea

A greedy approach is the way to go:

  • If the current text is empty, you're done.
  • Take the first N characters. If any of them is a digit then this is a new substring. Chop it off and go to beginning.
  • Otherwise, extend the digitless segment to at most M characters. This is a new substring. Chop it off and go to beginning.

Proof

Here's a reductio-ad-absurdum proof that the above yields an optimal solution. Assume there is a better split than the greedy split. Let's skip to the point where the two splits start to differ and remove everything before this point.

Case 1) A digit among the first N characters.

Assume that there is an input for which chopping off the first N characters cannot yield an optimal solution.

Greedy split:   |--N--|...
A better split: |---|--...
                      ^
                      +---- this segment can be shortened from the left side

However, the second segment of the putative better solution can be always shortened from the left side, and the first one extended to N characters, without altering the number of segments. Therefore, a contradiction: this split is not better than the greedy split.

Case 2) No digit among the first K (N < K <= M) characters.

Assume that there is an input for which chopping off the first K characters cannot yield an optimal solution.

Greedy split:   |--K--|...
A better split: |---|--...
                      ^
                      +---- this segment can be shortened from the left side

Again, the the "better" split can be transformed, without altering the number of segments, to the greedy split, which contradicts the initial assumption that there is a better split than the greedy split.

Therefore, the greedy split is optimal. Q.E.D.

Implementation (Python)

import sys

m, n, text = int(sys.argv[1]), int(sys.argv[2]), sys.argv[3]
textLen, isDigit = len(text), [c in '0123456789' for c in text]

chunks, i, j = [], 0, 0
while j < textLen:
   i, j = j, min(textLen, j + n) 
   if not any(isDigit[i:j]):
      while j < textLen and j - i < m and not isDigit[j]:
         j += 1
   chunks += [text[i:j]]
print chunks

Implementation (Java)

public class SO {
   public List<String> go(int m, int n, String text) {
      if (text == null)
         return Collections.emptyList();
      List<String> chunks = new ArrayList<String>();

      int i = 0;
      int j = 0;
      while (j < text.length()) {
         i = j;         
         j = Math.min(text.length(), j + n);
         boolean ok = true;
         for (int k = i; k < j; k++) 
            if (Character.isDigit(text.charAt(k))) {
               ok = false;              
               break;                   
            }                   
         if (ok)        
            while (j < text.length() && j - i < m && !Character.isDigit(text.charAt(j)))
               j++;                     
         chunks.add(text.substring(i, j));
      }         
      return chunks;
   }    

   @Test
   public void testIt() {
      Assert.assertEquals(
         Arrays.asList("asdas", "d332", "4asd", "fsdxf", "23"),
         go(5, 4, "asdasd3324asdfsdxf23"));
   }    
}
like image 196
Bolo Avatar answered Oct 17 '22 02:10

Bolo


Bolo has provided a greedy algorithm in his answer and asked for a counter-example. Well, there's no counter-example because that's perfectly correct approach. Here's the proof. Although it's a bit wordy, it often happens that proof is longer than algorithm itself :)

Let's imagine we have input of length L and constructed an answer A with our algorithm. Now, suppose there's a better answer B. I.e., B has less segments than A does.

Let's say, first segment in A has length la and in B - lb. la >= lb because we've choosen first segment in A to have maximum possible length. And if lb < la, we can increase length of first segment in B without increasing overall number of segments in B. It would give us some other optimal solution B', having same first segment as A.

Now, remove that first segment from A and B' and repeat operation for length L' < L. Do it until there's no segments left. It means, answer A is equal to some optimal solution.

like image 23
Nikita Rybak Avatar answered Oct 17 '22 01:10

Nikita Rybak