I know this is bike shedding but is there a way to get the set of strings C, between two (sorted) sets A,B of strings, where B is a sub string of A, with a complexity better than of A.size * B.size * comp_substr
, as the naive solution I came up?
std::copy_if(devices.cbegin(), devices.cend(),
std::back_inserter(ports),
[&comport_keys] (const auto& v) {
return std::any_of(comport_keys.begin(),comport_keys.end(), [&v](auto& k) {
return v.find(k) != std::string::npos;
});
});
The easier case of just where B is a string of A, with std::set_intersection
would be pretty simple with a complexity of (A.size + B.size) * comp_substr
, with would be even better if one had to sort it before (n * log(n))
, but I don't know how to write the compare function for it, or rather the sort of both.
#define BOOST_TEST_MODULE My Test
#include <boost/test/included/unit_test.hpp>
#include <vector>
#include <string>
#include <algorithm>
#include <iterator>
#include <set>
BOOST_AUTO_TEST_CASE(TEST) {
std::vector<std::string> devices{
"tty1",
"ttyOfk",
"ttyS05",
"bsd",
}, ports{};
const std::set<std::string> comport_keys{
"ttyS",
"ttyO",
"ttyUSB",
"ttyACM",
"ttyGS",
"ttyMI",
"ttymxc",
"ttyAMA",
"ttyTHS",
"ircomm",
"rfcomm",
"tnt",
"cu",
"ser",
};
std::sort(devices.begin(), devices.end());
std::set_intersection(devices.cbegin(), devices.cend(),
comport_keys.cbegin(), comport_keys.cend(),
std::back_inserter(ports),
[&comport_keys] (auto a, auto b) {
return a.find(b) != std::string::npos; //This is wrong
});
const std::vector<std::string>test_set {
"ttyOfk",
"ttyS05",
};
BOOST_TEST(ports == test_set);
}
Say we have two sets of strings: A and B. B contains a set of potential prefixes for the strings in A. So we want to take each element a from A and try to match it with all potential prefixes of B. If we find a matching prefix, we store our result a in C. The trivial solution works in O(|A| |B|). You ask: Can we optimize this?
You said, B is already sorted. Then we can build a generalised prefix tree on B in linear time and query it with each string in A to solve it in O(|A|+|B). The problem is, sorting B takes O(|B| log|B|) and the tree is non-trivial.
So I provide a simple solution with O(|A| log|B|) which is more efficient than O(|A|+|B|) if |A| is small, like in your example. B is still assumed to be sorted (the sorting is really the upper bound here...).
bool
validate_prefixes(const std::multiset<std::string>& keys) {
auto itb = keys.begin(), it = itb;
if(it == keys.end()) return false; //no keys
for(++it; it != keys.end(); ++it) {
if( (*it).find(*itb) != std::string::npos ) return false; //redundant keys
itb++;
}
return true;
}
bool
copy_from_intersecting_prefixes(const std::vector<std::string>& data,
std::multiset<std::string>& prefix_keys,
std::vector<std::string>& dest, bool check = false) {
if(check && !validate_prefixes(prefix_keys)) return false;
for(auto it_data = data.begin(); it_data != data.end(); ++it_data) {
auto ptr = prefix_keys.insert(*it_data), ptrb = ptr;
if(ptrb != prefix_keys.begin()) { //if data is at the start, there is no prefix
if( (*ptr).find(*(--ptrb)) != std::string::npos ) dest.push_back(*it_data);
}
prefix_keys.erase(ptr);
} //Complexity: O(|data|) * O( log(|prefix_keys|) ) * O(substr) = loop*insert*find
return check;
}
//.... in main()
std::multiset<std::string> tmp(comport_keys.begin(), comport_keys.end()); //copy const
copy_from_intersecting_prefixes(devices, tmp, ports);
validate_prefixes
enforces a precondition. It checks if we have at least one valid prefix and that the keys are not self-matching. E.g. we could have keys cu
and cu2
, but cu
is a prefix for cu2
, so they can't be both valid prefixes, either cu
is too general or cu2
too specific. If we try to match cu3
with cu
and cu2
this is inconsistent. Here validate_prefixes(comport_keys)
returns true
, but it might be nice to check it automatically.
copy_from_intersecting_prefixes
does the actual asked work. It iterates over A, and puts a inside the ordered B. The prefix is smaller than prefix+ending, so if a corresponding prefix exists, it will occur before a in B. Because the keys are not self-matching, we know that the prefix will precede a in B. So we decrement the iterator from a and compare. Note that prefix might equal a, so we need multiset.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With