Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fastest (and safest) method to replace characters in std::string

I have to replace a few (fixed) amount of characters in a long string: I'm wondering what is the fastest, but standard compliant way.

Here is a sample code with 6 different methods; in the comment of the method I've added the time in milliseconds to execute the operation 1 million times in a test environment, with optimizations enabled.

const char* pluto = "Cia1234567Ciao!";
std::string rep = "87654321";
std::string r1 = pluto, r2 = pluto, r3 = pluto, r4 = pluto, r5 = pluto, r6 = pluto;

// (1) 300 msec
r1.replace(3, 7, rep.substr(1));  

// (2) 40 msec
std::copy(rep.begin() + 1, rep.end(), r2.begin() + 3);

// (3) 32 msec
for (int i = 1; i < 8; ++i)
    r3[2 + i] = rep[i];

// (4) 14 msec
{
    const char *c = rep.c_str() + 1;
    for (int i = 0; i < 7; ++i)
        r4[3 + i] = *c++;
}

// (5) 3 msec (BEST)
memcpy(&r5[3], &rep[1], 7);

// (6) 100 msec
r6.replace(3, 7, rep.c_str() + 1);

So the fastest way seems (5), but I fear that this method may not work correctly with the "copy-on-write" std::string optimization that many compilers use.

IMHO (5) is also the more readable.

I wonder why (4) is twice as fast as (3), I thought that operator[] of std::string was quite optimized...


UPDATE:

After reading comments I've updated my code to use the google benchmark library and the results of (3) and (4) seems to be the same, the other differences still apply:

Run on (2 X 3000 MHz CPU s)
2015-11-24 14:46:50
Benchmark                   Time(ns)    CPU(ns) Iterations
-----------------------------------------------------------
(1) bench_replace_substr        293        264     2651515
(2) bench_std_copy               39         39    19662921
(3) bench_op_bracket             15         15    39772727
(4) bench_op_bracket_2           15         15    44871795
(5) bench_memcpy                  4          4    75000000
(6) bench_replace                80         80     8333333

So the differences in (3) and (4) are gone, but the rest of the results are the same :)

like image 897
gabry Avatar asked Nov 24 '15 12:11

gabry


1 Answers

The method using memcpy is standard-compliant at least since C++11 because

  1. As explained in this answer copy-on-write implementation of std::string is not allowed, because it violates standard invalidation of iterators/references requirements.

  2. std::string's characters are stored in contiguous memory, quoting 21.4.1.5:

    The char-like objects in a basic_string object shall be stored contiguously. That is, for any basic_string object s, the identity &*(s.begin() + n) == &*s.begin() + n shall hold for all values of n such that 0 <= n < s.size().

So it is the fastest standard-compliant of the methods in your list (at least according to your benchmark results).

In fact, this should be safe even with a non-standard-compliant implementation that does copy-on-write because non-const operator[] should make a copy of the string, for example:

std::string s1("foo");
std::string s2 = s1;
std::cout << static_cast<const void*>(s1.data()) << " "
          << static_cast<const void*>(s2.data()) << "\n";
s2[0];
std::cout << static_cast<const void*>(s1.data()) << " "
          << static_cast<const void*>(s2.data()) << "\n";

prints

0x1782028 0x1782028
0x1782028 0x1782058

when I compile it with gcc 4.8.4 and a fairly old version of libstdc++ and run. Note that the pointers are different after the call to non-const operator[] meaning that the data has been copied.

Knowing that non-const operator[] does some checks in COW implementation, it might be possible to speed up even more by calling const operator[]:

const std::string &crep = rep;
memcpy(&r5[3], &crep[1], 7);

which is indeed faster on my system:

Benchmark              Time(ns)    CPU(ns) Iterations
-----------------------------------------------------
bench_memcpy_const            2          2  314215561 
bench_memcpy                  3          3  276899830 
like image 85
vitaut Avatar answered Sep 23 '22 19:09

vitaut