I was asked the following question in an onsite interview:
A string is considered "balanced" when every letter in the string appears both in uppercase and lowercase. For e.g.,
CATattac
is balanced (a
,c
,t
occur in both cases), whileMadam
is not (a
,d
only appear in lowercase). Write a function that, given a string, returns the shortest balanced substring of that string. For e.g.,:
“azABaabza” should return “ABaab”
“TacoCat” should return -1 (not balanced)
“AcZCbaBz” should returns the entire string
Doing it with the brute force approach is trivial - calculating all the pairs of substrings and then checking if they are balanced, while keeping track of the size and starting index of the smallest one.
How do I optimize? I have a strong feeling it can be done with a sliding-window/two-pointer approach, but I am not sure how. When to update the pointers of the sliding window?
Edit: Removing the sliding-window tag since this is not a sliding-window problem (as discussed in the comments).
Find the shortest substring in a given string that contains all the characters of a given word or Find the Smallest window in a string containing all characters of another string 1. Generate all the substrings of string S 2. For every substring of S, check whether the substring contains all characters of string T 3.
Of the set of possible substrings 'dbca' is the shortest substring having all the distinct characters of given string. Input: aaab Output: ab Explanation: Possible substrings= {aaab, aab, ab} Of the set of possible substrings 'ab' is the shortest substring having all the distinct characters of given string. Try It!
76. Minimum Window Substring Given two strings s and t of lengths m and n respectively, return the minimum window substring of s such that every character in t (including duplicates) is included in the window. If there is no such substring, return the empty string "".
You are given a string S that is made of lowercase English alphabets. Determine the length of the smallest substring that contains the maximum number of distinct characters. The only line of input consists of a string that is made of lower case English alphabets. Print the required answer.
Due to the special property of string. There is only 26 uppercase letters and 26 lowercase letters.
We can loop every 26 letter j
and denote the minimum length for any substrings starting from position i
to find matches for uppercase and lowercase letter j
be len[i][j]
Demo C++ code:
string s = "CATattac";
// if len[i] >= s.size() + 1, it denotes there is no matching
vector<vector<int>> len(s.size(), vector<int>(26, 0));
for (int i = 0; i < 26; ++i) {
int upperPos = s.size() * 2;
int lowerPos = s.size() * 2;
for (int j = s.size() - 1; j >= 0; --j) {
if (s[j] == 'A' + i) {
upperPos = j;
} else if (s[j] == 'a' + i) {
lowerPos = j;
}
len[j][i] = max(lowerPos - j + 1, upperPos - j + 1);
}
}
We also keep track of the count of characters.
// cnt[i][j] denotes the number of characters j in substring s[0..i-1]
// cnt[0][j] is always 0
vector<vector<int>> cnt(s.size() + 1, vector<int>(26, 0));
for (int i = 0; i < s.size(); ++i) {
for (int j = 0; j < 26; ++j) {
cnt[i + 1][j] = cnt[i][j];
if (s[i] == 'A' + j || s[i] == 'a' + j) {
++cnt[i + 1][j];
}
}
}
Then we can loop over s.
int m = s.size() + 1;
for (int i = 0; i < s.size(); ++i) {
bool done = false;
int minLen = 1;
while (!done && i + minLen <= s.size()) {
// execute at most 26 times, a new character must be added to change minLen
int prevMinLen = minLen;
done = true;
for (int j = 0; j < 26 && i + minLen <= s.size(); ++j) {
if (cnt[i + minLen][j] - cnt[i][j] > 0) {
// character j exists in the substring, have to find pair of it
minLen = max(minLen, len[i][j]);
}
}
if (prevMinLen != minLen) done = false;
}
// find overall minLen
if (i + minLen <= s.size())
m = min(m, minLen);
cout << minLen << '\n';
}
Output: (if i + minLen <= s.size()
, it is valid. Otherwise substring doesn't exist if starting at that position)
The invalid output difference is due to how the array len
is generated.
8
4
15
14
13
12
11
10
I'm not sure whether there is a simpler solution but it is the best I could think of right now. Time complexity: O(N) with a constant of 26 * 26
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With