Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Performance of a trimming function

My old Trimming function:

string TailTrimString (const string & sSource, const char *chars) {
  size_t End = sSource.find_last_not_of(chars);
  if (End == string::npos) {
    // only "*chars"
    return "";
  }
  if (End == sSource.size() - 1) {
    // noting to trim
    return sSource;
  }
  return sSource.substr(0, End + 1);
}

Instead of it I've decided to use boost, and wrote the trivial:

string TailTrimString (const string & sSource, const char *chars) {
    return boost::algorithm::trim_right_copy_if(sSource,boost::algorithm::is_any_of(chars));
}

And I was amazed to find out that the new function works much slower. I've done some profiling, and I see that the function is_any_of is very slow.

Is it possible that boost's implementation works slower than my quite straightforward implementation? Is there anything I should use instead of is_any_of in order to improve the performance?

I also found a discussion on this matter in the boost's mailing list, but I am still not sure on how can I improve the performance of my code.

The boost version that I use is 1.38, which is quite old, but I guess this code didn't change too much since then.

Thank you.

like image 266
Igor Avatar asked Nov 30 '10 14:11

Igor


3 Answers

it possible that boost's implementation works slower than my quite straightforward implementation?

Of course.

Is there anything I should use instead of is_any_of in order to improve the performance?

Yeah -- your original code. You said nothing about it having a defect, or the reason why you re-implemented it using boost. If there was no defect in the original code, then there was no valid reason to chuck the original implementation.

Introducing Boost to a codebase makes sense. It brings a lot of functionality that can be helpful. But gutting a function for the sole purpose of using a new technology is a big rookie mistake.

EDIT:

In response to your comment:

I still don't understand, why boost's performance is worse.

A hand-crafted function that is designed to do one specific job for one specific application will often be faster than a generic solution. Boost is a great library of generic tools that can save a lot of programming and a lot of defects. But its generic. You may only need to trim your string in a specific way, but Boost handles everything. That takes time.

like image 87
John Dibling Avatar answered Oct 13 '22 08:10

John Dibling


In answer to your question about the relative performance, the std::string::find_last_not_of, will wrap C string routines (such as strcspan), and these are very fast, however boost::algorithm::is_any_of uses (probably used to use, I'd hazard that in the later versions this has changed!) a std::set for the set of characters to look for and does a check in this set for each character - which will not be anywhere near as fast!

EDIT: just to add an echo, your function works, it's not broken, it's not slow, so don't bother changing it...

like image 30
Nim Avatar answered Oct 13 '22 08:10

Nim


In answer to your question about the relative performance.

You are using boost::algorithm::trim_right_copy_if which, according to the name, creates a copy of the input before trimming. Try using boost::algorithm::trim_right_if to see if that has better performance. This function will perform the operation in-place instead of on a new string.

like image 42
KitsuneYMG Avatar answered Oct 13 '22 08:10

KitsuneYMG