Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using `std::search` over `string::find`

Tags:

c++

string

I had a question about the use of std::search vs string::find for dealing with strings. I know it is often better to use a class specific member function algorithm over the standard library algorithm because it can optimize based on the class, but I was wondering whether it is reasonable to, say for the sake of consistency, use std::search with the iterators rather than string::find with indices.

Would it be a sin for me to do something like that or should I just stick to string::find? Are there any huge advantages of one over the other in terms of performance or style?

like image 892
gowrath Avatar asked Apr 27 '17 12:04

gowrath


People also ask

How do I find a character in a string?

The strchr() function finds the first occurrence of a character in a string. The character c can be the null character (\0); the ending null character of string is included in the search. The strchr() function operates on null-ended strings.

How do you find if a string contains a substring in C++?

Using find() function to check if string contains substring in C++. We can use string::find() that can return the first occurrence of the substring in the string. It returns the index from the starting position and the default value for this function is 0. It returns -1 if the substring is not present in the string.

What does string find return C++ if not found?

The . find() method returns the index of the first occurrence of the specified string or character. If no result is found, string::npos is returned instead.


2 Answers

Right now (27th April 2017), at least GCCs libstdc++ (which is also used by clang by default), implements std::string::find with a linear search and thus is much slower than using

std::string_view substr{"whatever"};
auto it = std::search(s.cbegin(), s.cend(),
                      std::boyer_moore_searcher(substr.begin(), substr.end())); 

The problem is that the Boyer-Moore searcher allocates memory for internal data structures, and thus can fail with a std::bad_alloc exception. However, std::string::find is marked noexcept, so using the already implemented Boyer-Moore searcher within std::string::find isn't straight-forward.

like image 200
Corristo Avatar answered Oct 13 '22 01:10

Corristo


string::find uses linear search but it is several times faster than Boyer Moore for some cases (with the latest patch). I submitted a patch (first-element then memcomp) to both libstdc++ and libc++ which improved string::find significantly. You can try the recent gcc (7.1) and you will get the improved performance. You can also measure the performance with the simple benchmarking suite I wrote: https://github.com/hiraditya/std-benchmark

Especially for smaller strings, by the time Boyer Moore is busy constructing internal data structure, (sub) linear string::find will be done. Also for parsing HTML etc., where most of the searches are mismatches, string::find should be faster.

commit fc7ebc4b8d9ad7e2891b7f72152e8a2b7543cd65
Author: redi <redi@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Mon Jan 9 13:05:58 2017 +0000

    PR66414 optimize std::string::find

    2017-01-09  Jonathan Wakely  <[email protected]>
            Aditya Kumar  <[email protected]>

        PR libstdc++/66414
        * include/bits/basic_string.tcc
        (basic_string::find(const CharT*, size_type, size_type)): Optimize.

    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@244225 138bc75d-0d04-0410-961f-82ee72b054a4

PS: Using std::find will always be slower than the current std::string::find with the current implementation.

like image 21
A. K. Avatar answered Oct 12 '22 23:10

A. K.