Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fast method to transform a string with about 150mb

I've been trying to decrease every single char value in a std::stringstream by 100:

std::string str = stream.str();

auto decrement = [](char c) { return c - 100; };

std::string out;
out.reserve(str.size());
std::transform(str.begin(), str.end(), std::back_inserter(out), decrement);

stream = std::stringstream(out);

But it took 7 minutes stuck on the std::transform instruction. That for a 150mb text file.

I'm not using an optimized build. This is the debug build. The goal is to be able to have the code running faster for debugging purposes. Release results are secondary for this question.

Any suggestions on how to improve efficiency?

like image 283
Alexandre Severino Avatar asked Jun 12 '15 17:06

Alexandre Severino


3 Answers

One thing I'd consider would be transforming it in place if you don't use your str for anything else. That way you write back to the same location you read from and might get better caching behaviour. Simply change

std::transform(str.begin(), str.end(), std::back_inserter(out), decrement);

to

std::transform(str.begin(), str.end(), str.begin(), decrement);

and you can get rid of your out string completely. The 3rd (destination) parameter is allowed to be the same as the first parameter.

Not only does that get rid completely of the extra 150MB string variable, you previously had to access two different locations in the memory that should be quite a bit apart. With reading from and writing back to the same place you make sure that there really is maximum use of the cache.

Of course this mutates str so it's only really useful if you don't need the original str variable for something else.

End result:

std::string str = stream.str();

auto decrement = [](char c) { return c - 100; };
std::transform(str.begin(), str.end(), str.begin(), decrement);
stream = std::stringstream(str);
like image 112
AliciaBytes Avatar answered Nov 05 '22 01:11

AliciaBytes


There are two obvious speedups.

The first is to transform in-place.

std::string str = stream.str();

auto decrement = [=](char c) { return c -= 100; };

std::transform(str.begin(), str.end(), str.begin(), decrement);

stream = std::stringstream(str);

which was covered by Raphael.

The second, which is only because you want DEBUG optimized speed, is to bypass possibly debug iterator checks.

std::string str = stream.str();

auto decrement = [=](char c) { return c -= 100; };

std::transform(&str[0], (&str[0])+str.size(), (&str[0]), decrement);

stream = std::stringstream(str);

here we replace begin() with &str[0], a raw pointer to the character buffer contents. If you are working with extremely strange basic_strings, use std::addressof instead of &.

In a system with a iterators encumbered with debugging instrumentation, this could be much faster. In an optimized build, I would expect it to be the same speed.

like image 45
Yakk - Adam Nevraumont Avatar answered Nov 05 '22 03:11

Yakk - Adam Nevraumont


Slightly less elegant but I think still acceptable (also depends on your target machine) would be use of sse intrinsics (SSE2) if you need extra speed (around 5x faster than solution presented by Raphael).

#include <emmintrin.h>

__m128i dec = _mm_set1_epi8(100);
size_t x = 0;
for (; x < str.size()-15; x+=16)
{
    __m128i sse = _mm_loadu_si128((__m128i*)&str[x]);
    _mm_storeu_si128((__m128i*)&str[x], _mm_sub_epi8(sse, dec));
}

for (; x < str.size(); ++x)
    str[x] -= 100;
like image 4
doqtor Avatar answered Nov 05 '22 03:11

doqtor