Given a string A and another string B. Find whether any permutation of B exists as a substring of A.
For example,
if A = "encyclopedia"
if B="dep" then return true as ped is a permutation of dep and ped is a substring of A.
My solution->
if length(A)=n and length(B)=m
I did this in 0((n-m+1)*m) by sorting B and then checking A
with window size of m each time.
I need to find a better and a faster solution.
This answer proposes a fixed implementation of the idea proposed by @Ephraim in his own answer.
The original answer confuses the multiplication property of a given set of primes with addition. The fixed statement would be:
Let S = {A_1, ..., A_n} be a multiset list of size N that contains only prime numbers. Let the product of the numbers in S equal some integer Q. Then S is the only possible entirely prime multiset of size N, whose elements can multiply to Q.
In order to avoid numerical overflows, the implementation uses infinite precision arithmetic based on the C++
library libgmpxx.
Under the assumption that the comparison between two numbers is in O(1)
, then the solution is in O(|A| + |B|)
. The previous assumption might actually not be the case for inputs where |B|
is large enough. When |B| > |A|
the function returns in O(1)
.
Example:
#include <iostream>
#include <string>
#include <gmpxx.h>
static int primes[] =
{
2, 3, 5, 7, 11, 13, 17, 19, 23, 29,
31, 37, 41, 43, 47, 53, 59, 61, 67, 71,
73, 79, 83, 89, 97, 101, 103, 107, 109, 113,
127, 131, 137, 139, 149, 151, 157, 163, 167, 173,
179, 181, 191, 193, 197, 199, 211, 223, 227, 229,
233, 239, 241, 251, 257, 263, 269, 271, 277, 281,
283, 293, 307, 311, 313, 317, 331, 337, 347, 349,
353, 359, 367, 373, 379, 383, 389, 397, 401, 409,
419, 421, 431, 433, 439, 443, 449, 457, 461, 463,
467, 479, 487, 491, 499, 503, 509, 521, 523, 541,
547, 557, 563, 569, 571, 577, 587, 593, 599, 601,
607, 613, 617, 619, 631, 641, 643, 647, 653, 659,
661, 673, 677, 683, 691, 701, 709, 719, 727, 733,
739, 743, 751, 757, 761, 769, 773, 787, 797, 809,
811, 821, 823, 827, 829, 839, 853, 857, 859, 863,
877, 881, 883, 887, 907, 911, 919, 929, 937, 941,
947, 953, 967, 971, 977, 983, 991, 997, 1009, 1013,
1019, 1021, 1031, 1033, 1039, 1049, 1051, 1061, 1063, 1069,
1087, 1091, 1093, 1097, 1103, 1109, 1117, 1123, 1129, 1151,
1153, 1163, 1171, 1181, 1187, 1193, 1201, 1213, 1217, 1223,
1229, 1231, 1237, 1249, 1259, 1277, 1279, 1283, 1289, 1291,
1297, 1301, 1303, 1307, 1319, 1321, 1327, 1361, 1367, 1373,
1381, 1399, 1409, 1423, 1427, 1429, 1433, 1439, 1447, 1451,
1453, 1459, 1471, 1481, 1483, 1487, 1489, 1493, 1499, 1511,
1523, 1531, 1543, 1549, 1553, 1559, 1567, 1571, 1579, 1583,
1597, 1601, 1607, 1609, 1613, 1619
};
mpz_class prime_hash(std::string const &str, size_t start, size_t end)
{
mpz_class hash(1);
for (size_t i = start; i < end; ++i) {
hash *= primes[(size_t) str.at(i)];
}
return hash;
}
void print_all_permutations(std::string const &big, std::string const &small)
{
const size_t big_len = big.length();
const size_t small_len = small.length();
if (small_len <= 0 || big_len < small_len) {
// no possible permutation!
return;
}
// O(small_len)
mpz_class small_hash = prime_hash(small, 0, small_len);
mpz_class curr_hash = prime_hash(big, 0, small_len);
// O(big_len)
size_t left_idx = 0;
do {
if (curr_hash == small_hash) {
std::cout << "Permutation `" << big.substr(left_idx, small_len)
<< "' of `" << small
<< "' at index " << left_idx
<< " of `" << big
<< "'." << std::endl;
}
curr_hash /= primes[(size_t) big.at(left_idx)];
if (left_idx + small_len < big_len) {
curr_hash *= primes[(size_t) big.at(left_idx + small_len)];
}
++left_idx;
} while (left_idx + small_len < big_len);
}
int main(int argc, char *argv[])
{
// @user2826957's input
print_all_permutations("encyclopedia", "dep");
// @Ephraim's input
print_all_permutations("abcdabcd", "abcd");
// @Sam's input
print_all_permutations("cbabadcbbabbc", "abbc");
// @butt0s input
print_all_permutations("", "");
print_all_permutations("", "a");
print_all_permutations("a", "");
print_all_permutations("a", "a");
return 0;
}
The example is compiled with:
~$ g++ permstr.cpp -lgmpxx -lgmp -o run
~$ ./run
Permutation `ped' of `dep' at index 7 of `encyclopedia'.
Permutation `abcd' of `abcd' at index 0 of `abcdabcd'.
Permutation `bcda' of `abcd' at index 1 of `abcdabcd'.
Permutation `cdab' of `abcd' at index 2 of `abcdabcd'.
Permutation `dabc' of `abcd' at index 3 of `abcdabcd'.
Permutation `cbab' of `abbc' at index 0 of `cbabadcbbabbc'.
Permutation `cbba' of `abbc' at index 6 of `cbabadcbbabbc'.
Permutation `a' of `a' at index 0 of `a'.
There is a simpler solution to this problem which can be done in linear time.
Here: n = A.size (), m = B.size ()
The idea is to use hashing.
First we hash the characters of string B.
Suppose: B = "dep"
Now we run a loop over the string 'A' for each window of size 'm'.
Suppose: A = "encyclopedia"
First window of size 'm' will have characters {e, n, c}. We will hash them now.
Now we check if the frequency of each character from both the arrays (hash_B [] and win []) are same. Note: Maximum size of hash_B [] or win [] is 26.
If they are not same we shift our window.
After shifting the window we decrease the count of win ['e'] by 1 and increase the count of win ['y'] by 1.
During the seventh shift, the status of your win array is:
which is same as the hash_B array. So, Print "SUCCESS" and exit.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With