Suppose we have a string
s
. We want to find the length of the longest substring ofs
such that every character in the substring appears even number of times (possible zero).
WC Time:O(nlgn)
. WC space:O(n)
First, it's obvious that the substring must be of an even length. Second, I'm familiar with the sliding window method where we anchor some right
index and look for the left-most index to match your criterion. I tried to apply this idea here but couldn't really formulate it.
Also, it seems to me like a priority queue could come in handy (since the O(nlgn)
requirement is sort of hinting it)
I'd be glad for help!
Explanation: “abc” is the longest substring without repeating characters among all the substrings. Explanation: “wke” is the longest substring without repeating characters among all the substrings.
The time complexity of finding the longest non-repeated substring is O(n).
I'm not sure exactly what amit proposes so if this is it, please consider it another explanation. This can be accomplished in a single traversal.
Produce a bitset of length equal to the alphabet's for each index of the string. Store the first index for each unique bitset encountered while traversing the string. Update the largest interval between a current and previously seen bitset.
For example, the string, "aabccab":
a a b c c a b
0 1 2 3 4 5 6 (index)
_
0 1 0 0 0 0 1 1 | (vertical)
0 0 0 1 1 1 1 0 | bitset for
0 0 0 0 1 0 0 0 _| each index
^ ^
|___________|
largest interval
between current and
previously seen bitset
The update for each iteration can be accomplished in O(1) by preprocessing a bit mask for each character to XOR
with the previous bitset:
bitset mask
0 1 1
1 XOR 0 = 1
0 0 0
means update the character associated with the first bit in the alphabet-bitset.
Let's define the following bitsets:
B[c,i] = 1 if character c appeared in s[0,...,i] even number of times.
Calculating B[c,i]
takes linear time (for all values):
for all c:
B[c,-1] = 0
for i from 0 to len(arr):
B[c, i] = B[s[i], i-1] XOR 1
Since the alphabet is of constant size, so are the bitsets (for each i
).
Note that the condition:
every character in the substring appears even number of times
is true for substring s[i,j]
if and only if the bitset of index i
is identical to the bitset of index j
(otherwise, there is a bit that repeated odd number of times in this substring ; other direction: If there is a bit that repeated number of times, then its bit cannot be identical).
So, if we store all bitsets in some set (hash set/tree set), and keep only latest entry, this preprocessing takes O(n)
or O(nlogn)
time (depending on hash/tree set).
In a second iteration, for each index, find the farther away index with identical bitset (O(1)/O(logn)
, depending if hash/tree set), find the substring length, and mark it as candidate. At the end, take the longest candidate.
This solution is O(n)
space for the bitsets, and O(n)/O(nlogn)
time, depending if using hash/tree solution.
Pseudo code:
def NextBitset(B, c): # O(1) time
for each x in alphabet \ {c}:
B[x, i] = B[x, i-1]
B[c, i] = B[c, i-1] XOR 1
for each c in alphabet: # O(1) time
B[c,-1] = 0
map = new hash/tree map (bitset->int)
# first pass: # O(n)/O(nlogn) time
for i from 0 to len(s):
# Note, we override with the latest element.
B = NextBitset(B, s[i])
map[B] = i
for each c in alphabet: # O(1) time
B[c,-1] = 0
max_distance = 0
# second pass: O(n)/ O(nlogn) time.
for i from 0 to len(s):
B = NextBitset(B, s[i])
j = map.find(B) # O(1) / O(logn)
max_distance = max(max_distance, j-i)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With