Why the c++'s implemented string::find()
doesn't use the KMP algorithm (and doesn't run in O(N + M)
) and runs in O(N * M)
? Is that corrected in C++0x? If the complexity of current find is not O(N * M)
, what is that?
so what algorithm is implemented in gcc? is that KMP? if not, why? I've tested that and the running time shows that it runs in O(N * M)
find(std::string(n,'a')) is O(n^2) or worse.
However, since std::string is a generalized container and it can't make any assumptions about the nature of the string it holds, you can reasonably assume that complexity will be O(n) in case when you search for a single char .
The std::string class manages the underlying storage for you, storing your strings in a contiguous manner. You can get access to this underlying buffer using the c_str() member function, which will return a pointer to null-terminated char array. This allows std::string to interoperate with C-string APIs.
Why the c++'s implemented string::substr() doesn't use the KMP algorithm (and doesn't run in O(N + M)) and runs in O(N * M)?
I assume you mean find()
, rather than substr()
which doesn't need to search and should run in linear time (and only because it has to copy the result into a new string).
The C++ standard doesn't specify implementation details, and only specifies complexity requirements in some cases. The only complexity requirements on std::string
operations are that size()
, max_size()
, operator[]
, swap()
, c_str()
and data()
are all constant time. The complexity of anything else depends on the choices made by whoever implemented the library you're using.
The most likely reason for choosing a simple search over something like KMP is to avoid needing extra storage. Unless the string to be found is very long, and the string to search contains a lot of partial matches, the time taken to allocate and free that would likely be much more than the cost of the extra complexity.
Is that corrected in c++0x?
No, C++11 doesn't add any complexity requirements to std::string
, and certainly doesn't add any mandatory implementation details.
If the complexity of current substr is not O(N * M), what is that?
That's the worst-case complexity, when the string to search contains a lot of long partial matches. If the characters have a reasonably uniform distribution, then the average complexity would be closer to O(N)
. So by choosing an algorithm with better worst-case complexity, you may well make more typical cases much slower.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With