I have to replace a few (fixed) amount of characters in a long string: I'm wondering what is the fastest, but standard compliant way.
Here is a sample code with 6 different methods; in the comment of the method I've added the time in milliseconds to execute the operation 1 million times in a test environment, with optimizations enabled.
const char* pluto = "Cia1234567Ciao!";
std::string rep = "87654321";
std::string r1 = pluto, r2 = pluto, r3 = pluto, r4 = pluto, r5 = pluto, r6 = pluto;
// (1) 300 msec
r1.replace(3, 7, rep.substr(1));
// (2) 40 msec
std::copy(rep.begin() + 1, rep.end(), r2.begin() + 3);
// (3) 32 msec
for (int i = 1; i < 8; ++i)
r3[2 + i] = rep[i];
// (4) 14 msec
{
const char *c = rep.c_str() + 1;
for (int i = 0; i < 7; ++i)
r4[3 + i] = *c++;
}
// (5) 3 msec (BEST)
memcpy(&r5[3], &rep[1], 7);
// (6) 100 msec
r6.replace(3, 7, rep.c_str() + 1);
So the fastest way seems (5), but I fear that this method may not work correctly with the "copy-on-write" std::string
optimization that many compilers use.
IMHO (5) is also the more readable.
I wonder why (4) is twice as fast as (3), I thought that operator[]
of std::string
was quite optimized...
UPDATE:
After reading comments I've updated my code to use the google benchmark library and the results of (3) and (4) seems to be the same, the other differences still apply:
Run on (2 X 3000 MHz CPU s)
2015-11-24 14:46:50
Benchmark Time(ns) CPU(ns) Iterations
-----------------------------------------------------------
(1) bench_replace_substr 293 264 2651515
(2) bench_std_copy 39 39 19662921
(3) bench_op_bracket 15 15 39772727
(4) bench_op_bracket_2 15 15 44871795
(5) bench_memcpy 4 4 75000000
(6) bench_replace 80 80 8333333
So the differences in (3) and (4) are gone, but the rest of the results are the same :)
The method using memcpy
is standard-compliant at least since C++11 because
As explained in this answer copy-on-write implementation of std::string
is not allowed, because it violates standard invalidation of iterators/references requirements.
std::string
's characters are stored in contiguous memory, quoting 21.4.1.5:
The char-like objects in a basic_string object shall be stored contiguously. That is, for any basic_string object s, the identity &*(s.begin() + n) == &*s.begin() + n shall hold for all values of n such that 0 <= n < s.size().
So it is the fastest standard-compliant of the methods in your list (at least according to your benchmark results).
In fact, this should be safe even with a non-standard-compliant implementation that does copy-on-write because non-const operator[]
should make a copy of the string, for example:
std::string s1("foo");
std::string s2 = s1;
std::cout << static_cast<const void*>(s1.data()) << " "
<< static_cast<const void*>(s2.data()) << "\n";
s2[0];
std::cout << static_cast<const void*>(s1.data()) << " "
<< static_cast<const void*>(s2.data()) << "\n";
prints
0x1782028 0x1782028
0x1782028 0x1782058
when I compile it with gcc 4.8.4 and a fairly old version of libstdc++ and run. Note that the pointers are different after the call to non-const operator[]
meaning that the data has been copied.
Knowing that non-const operator[]
does some checks in COW implementation, it might be possible to speed up even more by calling const operator[]
:
const std::string &crep = rep;
memcpy(&r5[3], &crep[1], 7);
which is indeed faster on my system:
Benchmark Time(ns) CPU(ns) Iterations
-----------------------------------------------------
bench_memcpy_const 2 2 314215561
bench_memcpy 3 3 276899830
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With