I have a two strings:
string1 - hello how are you
,
String2 - olo
(including space character)
Output: lo ho
( hello how are you )
lo ho
is the only substring that contain all characters of string2.
Can anyone please suggest a good algorithm for this ( i can think og only Brute Force algo - O(n^2).
Also output should be the minimum length string(in case of multiple options).
Simple Approach: The idea is to run a loop from start to end and for every index in the given string check whether the sub-string can be formed from that index. This can be done by running a nested loop traversing the given string and in that loop running another loop checking for sub-string from every index.
Keep two pointer l
and r
, and a hash table M = character -> count
for characters in string2 that do not occur in s[l..r]
.
Initially set l = 0
and r
so that string1[l..r]
contains all the characters of string2
(if possible). You do that by removing characters from M until it is empty.
Then proceed by incrementing r
by one in each step and then incrementing l
as much as possible while still keeping M empty. The minimum over all r - l + 1
(the length of the substring s[l..r]
) is the solution.
Pythonish pseudocode:
n = len(string1)
M = {} # let's say M is empty if it contains no positive values
for c in string2:
M[c]++
l = 0
r = -1
while r + 1 < n and M not empty:
r++
M[string1[r]]--
if M not empty:
return "no solution"
answer_l, answer_r = l, r
while True:
while M[string1[l]] < 0:
M[string1[l]]++
l++
if r - l + 1 < answer_r - anwer_l + 1:
answer_l, answer_r = l, r
r++
if r == n:
break
M[string1[r]]--
return s[answer_l..answer_r]
The "is empty" checks can be implemented in O(1) if you maintain the number of positive entries when performing the increment and decrement operations.
Let n
be the length of string1
and m
be the length of string2
.
Note that l
and r
are only ever incremented, so there are at most O(n) increments and thus at most O(n) instructions are executed in the last outer loop.
If M
is implemented as an array (I assume the alphabet is constant size), you get runtime
O(n + m), which is optimal. If the alphabet is too large, you can use a hash table to get expected O(n + m).
Example execution:
string1 = "abbabcdbcb"
string2 = "cbb"
# after first loop
M = { 'a': 0, 'b': 2, 'c': 1, 'd': 0 }
# after second loop
l = 0
r = 5
M = { 'a': -2, 'b': -1, 'c': 0, 'd': 0 }
# increment l as much as possible:
l = 2
r = 5
M = { 'a': -1, 'b': 0, 'c': 0, 'd': 0 }
# increment r by one and then l as much as possible
l = 2
r = 6
M = { 'a': -1, 'b': 0, 'c': 0, 'd': -1 }
# increment r by one and then l as much as possible
l = 4
r = 7
M = { 'a': 0, 'b': 0, 'c': 0, 'd': -1 }
# increment r by one and then l as much as possible
l = 4
r = 8
M = { 'a': 0, 'b': 0, 'c': -1, 'd': -1 }
# increment r by one and then l as much as possible
l = 7
r = 9
M = { 'a': 0, 'b': 0, 'c': 0, 'd': 0 }
The best solution is s[7..9].
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With