Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find longest substring of N unique characters

Tags:

algorithm

Input: str="abcdeefuiuiwiwwaaaa" n=3 output: "iwiwwaaaa" (longest substr with 3 diff chars)

I have a solution as below. My questions:

  1. How is the time complexity? I know it must be better than O(n^2), but not sure whether can conclude it's O(n).
  2. The solution below can not cover the whole ASCII, can we improve this without additional space?

    public static String getSubstrOfMChars(String str, int m) 
    {
         if (str==null || str.length()==0)
             return "";     
    
         int len = str.length();        
         String max = "";
    
         for(int i=0; i<len;) 
         {  
             StringBuilder sb = new StringBuilder();
             int counter = 1;
             int checker = 0;
             char firstChar = str.charAt(i);
             int firstCharPos = i;    // first char position in the string
             sb.append(firstChar);
             checker |= 1 << (firstChar - 'a');
    
             for(int j=i+1; j<len; j++) 
             {  
                 char currChar = str.charAt(j);
                 if (currChar == firstChar) 
                     firstCharPos++;                
    
                 int tester = checker & (1<<(currChar - 'a'));
                 if ( tester > 0 ) // already have such character
                 {
                     sb.append(currChar);
                     continue;
                 }
    
                 // new character
                 if (++counter > m) 
                 {
                    i = firstCharPos + 1;
    
                    if (sb.length() > max.length()) 
                    {
                        max = sb.toString();
                    }
                    break;
                 }
                 sb.append(currChar);                   
                 checker |= 1 << (currChar - 'a');              
            }
    
            if (counter <= m) 
            {               
                if ((counter==m) && sb.length() > max.length()) 
                {
                    max = sb.toString();
                }               
                break;
            }
    
         }
    
         return max;        
    }
    
like image 527
punchoyeah Avatar asked Jan 14 '14 16:01

punchoyeah


People also ask

How would you find the longest substring which contains only two unique characters?

If the length of string is n, then there can be n*(n+1)/2 possible substrings. A simple way is to generate all the substring and check each one whether it has exactly k unique characters or not. If we apply this brute force, it would take O(n2) to generate all substrings and O(n) to do a check on each one.

How do you find longest substring without repeating characters in a string?

Run a loop from i = 0 till N – 1 and consider a visited array. Run a nested loop from j = i + 1 to N – 1 and check whether the current character S[j] has already been visited. If true, break from the loop and consider the next window. Else, mark the current character as visited and maximize the length as j – i + 1.


2 Answers

There is an O(n). Let S be the string. Just go through the array with two pointers i and j and keep track of number K of different letters between S[i] and S[j]. Increment j whenever this number is smaller or equal n and increment i whenever K is greater than n. Also remember the longest substring for which K was equal to n.

In the implementation you also need a hash table to keep track of the last occurrence of the letter.

Python implementation:

def longest(S,n):
    i = j = K = 0
    res = (0,0)
    last = {}

    while i < len(S):
        # if current substring is better than others than save
        if K == n and j - i > res[1] - res[0]:
            res = (i,j)

        # if we reach the end of the string, we're done.
        if j + 1 > len(S):
            break
        # if we can go further with the right end than do it
        elif K <= n and j + 1 <= len(S):
            if not last.has_key(S[j]):
                K = K + 1
            last[S[j]] = j
            j = j + 1
        # if we must go further with the left end than do it
        else:
            if last.has_key(S[i]):
                del last[S[i]]
                K = K - 1
            i = i + 1
    return S[res[0]:res[1]]
like image 111
Łukasz Kidziński Avatar answered Sep 21 '22 12:09

Łukasz Kidziński


Your present code complexity is O(N^2) since you are using nested for loops to check for substrings that start from each character.

IMO you can do this in O(N*k) time and O(k) extra space (where k = number of unique characters allowed):

  1. Iterate the string from the beginning and add the 1st character in a map of value to last position found.
  2. Keep parsing the string and updating the last position found of each character in the map.
  3. When you get a new character, increment the count of characters and make last position found for this character = current position.
  4. When the count in map reaches k, iterate the map and search for the value of minimum position index. calculate present position - min(last position index) and update max length substring accordingly. Decrement count. Pop out this character from the map.
  5. Continue the above till you reach the end of the string.
like image 36
Abhishek Bansal Avatar answered Sep 22 '22 12:09

Abhishek Bansal