In my project i have to iterate over a large string starting from index=0 and get length k substring. I've implemented string::substr() and wonder if there are other efficient methods.
For example :
std::string S ="ABCDEFGHIJKLMN"
i need to get all substrings of length =5 starting from the beginning of S.Just like
"ABCDE"
, "BCDEF"
, "CDEFG"
and so on..
My implementation is like below:
void geekfunc(std::string &str)
{
unsigned int index=0;
for (; index<=(str.size()-K);++index)
{
++myseqmap[str.substr(index,K)];
}
}
This function is being called ten million times and I welcome other methods to try.
If you are using C++17, you can use string_view
as your parameter and map key type. This way you won't make copies of the string contents every time you call substr
. Just make sure the string you pass to the function is not destroyed or modified while your map it still in use.
std::map<std::string_view, std::size_t> myseqmap;
void geekfunc(std::string_view str)
{
unsigned int index=0;
for (; index<=(str.size()-K);++index)
{
++myseqmap[str.substr(index,K)];
}
}
Provided you actually need to create copies of the substrings (which string::substr does) I believe you cannot solve this issue with less than Omega(m)
calls to the memory manager and Omega(m * k)
copying steps total, where m = n - k + 1
. This is because the standard requires each string to manage its own memory. Sharing (such as with the copy-on-write idiom) is not permitted, thus each substring will copy its contents from the original.
If a copy is not required and your compiler already supplies std::string_view you could try using that. Unlike string
, a string_view
only holds a pointer to a character and a size (which is exactly what you are creating your substrings from anyways). The required pointer can be acquired using string::data.
However, when using string_view
you have to make sure the original string remains in scope for as long as the container holding the substrings and that it is not altered after the substrings have been created, as this may invalidate the pointers held by the string_view
s. These can be adressed by wrapping everything in a class together like this:
struct substrings{
const std::string original;
container<string_view> substrings;
};
Where container
is any container of your choice.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With