Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

c++ efficiently get substring of string with index

Tags:

c++

c++11

substr

In my project i have to iterate over a large string starting from index=0 and get length k substring. I've implemented string::substr() and wonder if there are other efficient methods.

For example :

std::string S ="ABCDEFGHIJKLMN"

i need to get all substrings of length =5 starting from the beginning of S.Just like "ABCDE", "BCDEF", "CDEFG" and so on..

My implementation is like below:

    void geekfunc(std::string &str)
{
    unsigned int index=0;
    for (; index<=(str.size()-K);++index)
    {
        ++myseqmap[str.substr(index,K)];
    }
}

This function is being called ten million times and I welcome other methods to try.

like image 797
Krcn U Avatar asked Sep 18 '25 11:09

Krcn U


2 Answers

If you are using C++17, you can use string_view as your parameter and map key type. This way you won't make copies of the string contents every time you call substr. Just make sure the string you pass to the function is not destroyed or modified while your map it still in use.

std::map<std::string_view, std::size_t> myseqmap;

void geekfunc(std::string_view str)
{
    unsigned int index=0;
    for (; index<=(str.size()-K);++index)
    {
        ++myseqmap[str.substr(index,K)];
    }
}
like image 114
Joseph Thomson Avatar answered Sep 20 '25 02:09

Joseph Thomson


Provided you actually need to create copies of the substrings (which string::substr does) I believe you cannot solve this issue with less than Omega(m) calls to the memory manager and Omega(m * k) copying steps total, where m = n - k + 1. This is because the standard requires each string to manage its own memory. Sharing (such as with the copy-on-write idiom) is not permitted, thus each substring will copy its contents from the original.

If a copy is not required and your compiler already supplies std::string_view you could try using that. Unlike string, a string_view only holds a pointer to a character and a size (which is exactly what you are creating your substrings from anyways). The required pointer can be acquired using string::data.

However, when using string_view you have to make sure the original string remains in scope for as long as the container holding the substrings and that it is not altered after the substrings have been created, as this may invalidate the pointers held by the string_views. These can be adressed by wrapping everything in a class together like this:

struct substrings{
    const std::string original;
    container<string_view> substrings;
};

Where container is any container of your choice.

like image 41
Pandatyr Avatar answered Sep 20 '25 03:09

Pandatyr