From a recent interview of Amazon, I found the following question. I could not figure out an effective way to solve it.
The problem reads as:
Given an array of strings, you need to find the longest running sequence of a character among all possible permutations of the strings in the array.
INPUT :
ab
ba
aac
OUTPUT :
a,3
Note: From the input and output set, I think the permutation of the individual strings is not to be done.
Would really appreciate if somebody can help. Thanks.
Lovely question. So many corner cases. I'm guessing that the whole point of this interview question is to see how many corner cases you come up.
I hope I haven't missed any.
There are essentially two ways that a character sequence could be the solution to this problem:
1) It is an interior character sequence (eg. adddddddddddddddddb
)
2) It is the concatentation of a suffix, the entire collection of strings consisting only of that character, and a prefix. In this case, no string may be used more than once, including the case where a character is the suffix and the prefix of the same string. (To avoid resuing the homogenous strings, the suffixes and prefixes must be strict; i.e. not the entire string).
Case 1 is easy. We simply remember a single character and sequence length, and the current character and sequence length. As we read in the strings, if the current character/sequence length is longer than the maximum, we replace the maximum. We don't need to worry about it conflicting with the computations done for Case 2, because it won't affect the result.
Case 2 is a lot more work. For each character, we need to keep some data. We can either use a fixed-size array, one entry per character in the alphabet, if the alphabet is small, or we can use a hash-table of characters. Both are O(1)
on average; since the number of characters we'll deal with cannot be larger than the total number of characters in all the strings, the size requirement of the hash-table can be thought of as O(N)
. (In fact, it is limited by the size of the alphabet, so just as with the fixed-size array, the storage requirement is technically O(1)
but in the case of Unicode, for example, the constant is rather large.)
Now, what data do we need? Strings which are just a repetition of a single character are easy; we need the total length of those strings. So every time we find such a string, we can add its length to the total length member of the entry in our per-character data.
For (strict) suffixes and prefixes, it seems like we only need to maintain a maximum length for each. But what if we encounter a string whose prefix and suffix sequences are the same character, and both of the sequences are longer than any that we've handled previously? We can't use the string as both suffix and prefix. Fortunately, there is a simple answer: we keep three values: maximum_prefix, maximum_suffix, maximum_sum. If we're updating the table after reading a word, and it turns out that the same character is both prefix and suffix, we update the three values as follows:
maximum_sum = max(maximum_sum,
prefix_length + maximum_suffix,
suffix_length + maximum_prefix)
maximum_prefix = max(maximum_prefix, prefix_length)
maximum_suffix = max(maximum_suffix, suffix_length)
Note that the above pseudo-code works just fine (if a bit wastefully) if either prefix_length or suffix_length is 0.
So that's a total of four per-character values: homogenous_length, maximum_sum, maximum_prefix, maximum_suffix
. At the end of the scan, we need to find the character for which homogenous_length + maximum_sum
is the greatest; we can do that by a simple scan over the character table.
Total processing time is O(1)
per character (for the initial scan), which is O(N)
(where N
is the total number of characters in the problem, plus O(max(N, |A|))
for the final scan of the character table (|A|
is the size of the alphabet); in other words, O(N)
. Space requirements were described above.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With