Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Longest substring where every character appear even number of times (possibly zero)

Suppose we have a string s. We want to find the length of the longest substring of s such that every character in the substring appears even number of times (possible zero).
WC Time: O(nlgn). WC space: O(n)

First, it's obvious that the substring must be of an even length. Second, I'm familiar with the sliding window method where we anchor some right index and look for the left-most index to match your criterion. I tried to apply this idea here but couldn't really formulate it.

Also, it seems to me like a priority queue could come in handy (since the O(nlgn) requirement is sort of hinting it)

I'd be glad for help!

like image 854
Elimination Avatar asked May 28 '18 10:05

Elimination


People also ask

What is longest substring without repeating characters?

Explanation: “abc” is the longest substring without repeating characters among all the substrings. Explanation: “wke” is the longest substring without repeating characters among all the substrings.

What is the time complexity for finding the longest substring?

The time complexity of finding the longest non-repeated substring is O(n).


2 Answers

I'm not sure exactly what amit proposes so if this is it, please consider it another explanation. This can be accomplished in a single traversal.

Produce a bitset of length equal to the alphabet's for each index of the string. Store the first index for each unique bitset encountered while traversing the string. Update the largest interval between a current and previously seen bitset.

For example, the string, "aabccab":

  a a b c c a b
  0 1 2 3 4 5 6 (index)

                _
0 1 0 0 0 0 1 1  |  (vertical)
0 0 0 1 1 1 1 0  |  bitset for
0 0 0 0 1 0 0 0 _|  each index
  ^           ^
  |___________|

   largest interval
   between current and
   previously seen bitset

The update for each iteration can be accomplished in O(1) by preprocessing a bit mask for each character to XOR with the previous bitset:

bitset     mask
  0         1     1
  1   XOR   0  =  1
  0         0     0

means update the character associated with the first bit in the alphabet-bitset.

like image 68
גלעד ברקן Avatar answered Sep 21 '22 10:09

גלעד ברקן


Let's define the following bitsets:

B[c,i] = 1 if character c appeared in s[0,...,i] even number of times.

Calculating B[c,i] takes linear time (for all values):

for all c:
  B[c,-1] = 0
for i from 0 to len(arr):
  B[c, i] = B[s[i], i-1] XOR 1

Since the alphabet is of constant size, so are the bitsets (for each i).

Note that the condition:

every character in the substring appears even number of times

is true for substring s[i,j] if and only if the bitset of index i is identical to the bitset of index j (otherwise, there is a bit that repeated odd number of times in this substring ; other direction: If there is a bit that repeated number of times, then its bit cannot be identical).

So, if we store all bitsets in some set (hash set/tree set), and keep only latest entry, this preprocessing takes O(n) or O(nlogn) time (depending on hash/tree set).

In a second iteration, for each index, find the farther away index with identical bitset (O(1)/O(logn), depending if hash/tree set), find the substring length, and mark it as candidate. At the end, take the longest candidate.

This solution is O(n) space for the bitsets, and O(n)/O(nlogn) time, depending if using hash/tree solution.


Pseudo code:

def NextBitset(B, c): # O(1) time
  for each x in alphabet \ {c}:
    B[x, i] = B[x, i-1]
   B[c, i] = B[c, i-1] XOR 1

for each c in alphabet: # O(1) time
  B[c,-1] = 0
map = new hash/tree map (bitset->int)

# first pass: # O(n)/O(nlogn) time
for i from 0 to len(s):
   # Note, we override with the latest element.
   B = NextBitset(B, s[i])
   map[B] = i

for each c in alphabet: # O(1) time
  B[c,-1] = 0
max_distance = 0
# second pass: O(n)/ O(nlogn) time.
for i from 0 to len(s):
  B = NextBitset(B, s[i])
  j = map.find(B)  # O(1) / O(logn)
  max_distance = max(max_distance, j-i)
like image 25
amit Avatar answered Sep 20 '22 10:09

amit