Suppose I have
std::vector<T1> vec1 {/* filled with T1's */};
std::vector<T2> vec2 {/* filled with T2's */};
and some function T1 f(T2)
which could of course be a lambda. What is the optimal way to concatenate vec1
and vec2
whilst applying f
to each T2
in vec2
?
The apparently obvious solution is std::transform
, i.e.
vec1.reserve(vec1.size() + vec2.size());
std::transform(vec2.begin(), vec2.end(), std::back_inserter(vec1), f);
but I say this is not optimal as std::back_inserter
must make an unnecessary capacity check on each inserted element. What would be optimal is something like
vec1.insert(vec1.end(), vec2.begin(), vec2.end(), f);
which could get away with a single capacity check. Sadly this is not valid C++. Essentially this is the same reason why std::vector::insert
is optimal for vector concatenation, see this question and the comments in this question for further discussion on this point.
So:
std::transform
the optimal method using the STL?insert
function described above was left out of the STL?UPDATE
I've had a go at verifying if the multiple capacity checks do have any noticeable cost. To do this I basically just pass the id function (f(x) = x
) to the std::transform
and push_back
methods discussed in the answers. The full code is:
#include <iostream>
#include <vector>
#include <iterator>
#include <algorithm>
#include <cstdint>
#include <chrono>
#include <numeric>
#include <random>
using std::size_t;
std::vector<int> generate_random_ints(size_t n)
{
std::default_random_engine generator;
auto seed1 = std::chrono::system_clock::now().time_since_epoch().count();
generator.seed((unsigned) seed1);
std::uniform_int_distribution<int> uniform {};
std::vector<int> v(n);
std::generate_n(v.begin(), n, [&] () { return uniform(generator); });
return v;
}
template <typename D=std::chrono::nanoseconds, typename F>
D benchmark(F f, unsigned num_tests)
{
D total {0};
for (unsigned i = 0; i < num_tests; ++i) {
auto start = std::chrono::system_clock::now();
f();
auto end = std::chrono::system_clock::now();
total += std::chrono::duration_cast<D>(end - start);
}
return D {total / num_tests};
}
template <typename T>
void std_insert(std::vector<T> vec1, const std::vector<T> &vec2)
{
vec1.insert(vec1.end(), vec2.begin(), vec2.end());
}
template <typename T1, typename T2, typename UnaryOperation>
void push_back_concat(std::vector<T1> vec1, const std::vector<T2> &vec2, UnaryOperation op)
{
vec1.reserve(vec1.size() + vec2.size());
for (const auto& x : vec2) {
vec1.push_back(op(x));
}
}
template <typename T1, typename T2, typename UnaryOperation>
void transform_concat(std::vector<T1> vec1, const std::vector<T2> &vec2, UnaryOperation op)
{
vec1.reserve(vec1.size() + vec2.size());
std::transform(vec2.begin(), vec2.end(), std::back_inserter(vec1), op);
}
int main(int argc, char **argv)
{
unsigned num_tests {1000};
size_t vec1_size {10000000};
size_t vec2_size {10000000};
auto vec1 = generate_random_ints(vec1_size);
auto vec2 = generate_random_ints(vec1_size);
auto f_std_insert = [&vec1, &vec2] () {
std_insert(vec1, vec2);
};
auto f_push_back_id = [&vec1, &vec2] () {
push_back_concat(vec1, vec2, [] (int i) { return i; });
};
auto f_transform_id = [&vec1, &vec2] () {
transform_concat(vec1, vec2, [] (int i) { return i; });
};
auto std_insert_time = benchmark<std::chrono::milliseconds>(f_std_insert, num_tests).count();
auto push_back_id_time = benchmark<std::chrono::milliseconds>(f_push_back_id, num_tests).count();
auto transform_id_time = benchmark<std::chrono::milliseconds>(f_transform_id, num_tests).count();
std::cout << "std_insert: " << std_insert_time << "ms" << std::endl;
std::cout << "push_back_id: " << push_back_id_time << "ms" << std::endl;
std::cout << "transform_id: " << transform_id_time << "ms" << std::endl;
return 0;
}
Compiled with:
g++ vector_insert_demo.cpp -std=c++11 -O3 -o vector_insert_demo
Output:
std_insert: 44ms
push_back_id: 61ms
transform_id: 61ms
The compiler will have inlined the lambda, so that cost can be safely be discounted. Unless anyone else has a viable explanation for these results (or is willing to check the assembly), I think it's reasonable to conclude there is a noticeable cost of the multiple capacity checks.
The concatenation of vectors can be done by using combination function c. For example, if we have three vectors x, y, z then the concatenation of these vectors can be done as c(x,y,z). Also, we can concatenate different types of vectors at the same time using the same same function.
Concatenate function: This is a function that combines its arguments. This method combines the arguments and results in a vector. All the arguments are converted to a common type that is the return value type. This function is also used to convert the array into vectors.
C++ std::vector Concatenating VectorsOne std::vector can be append to another by using the member function insert() : std::vector<int> a = {0, 1, 2, 3, 4}; std::vector<int> b = {5, 6, 7, 8, 9}; a. insert(a. end(), b.
Appending a vector elements to another vector To insert/append a vector's elements to another vector, we use vector::insert() function. Syntax: //inserting elements from other containers vector::insert(iterator position, iterator start_position, iterator end_position);
UPDATE: The performance difference is due to the reserve()
calls, which, in libstdc++ at least, make the capacity be exactly what you request instead of using the exponential growth factor.
I did some timing tests, with interesting results. Using std::vector::insert
along with boost::transform_iterator
was the fastest way I found by a large margin:
Version 1:
void
appendTransformed1(
std::vector<int> &vec1,
const std::vector<float> &vec2
)
{
auto v2begin = boost::make_transform_iterator(vec2.begin(),f);
auto v2end = boost::make_transform_iterator(vec2.end(),f);
vec1.insert(vec1.end(),v2begin,v2end);
}
Version 2:
void
appendTransformed2(
std::vector<int> &vec1,
const std::vector<float> &vec2
)
{
vec1.reserve(vec1.size()+vec2.size());
for (auto x : vec2) {
vec1.push_back(f(x));
}
}
Version 3:
void
appendTransformed3(
std::vector<int> &vec1,
const std::vector<float> &vec2
)
{
vec1.reserve(vec1.size()+vec2.size());
std::transform(vec2.begin(),vec2.end(),std::inserter(vec1,vec1.end()),f);
}
Timing:
Version 1: 0.59s Version 2: 8.22s Version 3: 8.42s
main.cpp:
#include <algorithm>
#include <cassert>
#include <chrono>
#include <iterator>
#include <iostream>
#include <random>
#include <vector>
#include "appendtransformed.hpp"
using std::cerr;
template <typename Engine>
static std::vector<int> randomInts(Engine &engine,size_t n)
{
auto distribution = std::uniform_int_distribution<int>(0,999);
auto generator = [&]{return distribution(engine);};
auto vec = std::vector<int>();
std::generate_n(std::inserter(vec,vec.end()),n,generator);
return vec;
}
template <typename Engine>
static std::vector<float> randomFloats(Engine &engine,size_t n)
{
auto distribution = std::uniform_real_distribution<float>(0,1000);
auto generator = [&]{return distribution(engine);};
auto vec = std::vector<float>();
std::generate_n(std::inserter(vec,vec.end()),n,generator);
return vec;
}
static auto
appendTransformedFunction(int version) ->
void(*)(std::vector<int>&,const std::vector<float> &)
{
switch (version) {
case 1: return appendTransformed1;
case 2: return appendTransformed2;
case 3: return appendTransformed3;
default:
cerr << "Unknown version: " << version << "\n";
exit(EXIT_FAILURE);
}
return 0;
}
int main(int argc,char **argv)
{
if (argc!=2) {
cerr << "Usage: appendtest (1|2|3)\n";
exit(EXIT_FAILURE);
}
auto version = atoi(argv[1]);
auto engine = std::default_random_engine();
auto vec1_size = 1000000u;
auto vec2_size = 1000000u;
auto count = 100;
auto vec1 = randomInts(engine,vec1_size);
auto vec2 = randomFloats(engine,vec2_size);
namespace chrono = std::chrono;
using chrono::system_clock;
auto appendTransformed = appendTransformedFunction(version);
auto start_time = system_clock::now();
for (auto i=0; i!=count; ++i) {
appendTransformed(vec1,vec2);
}
auto end_time = system_clock::now();
assert(vec1.size() == vec1_size+count*vec2_size);
auto sum = std::accumulate(vec1.begin(),vec1.end(),0u);
auto elapsed_seconds = chrono::duration<float>(end_time-start_time).count();
cerr << "Using version " << version << ":\n";
cerr << " sum=" << sum << "\n";
cerr << " elapsed: " << elapsed_seconds << "s\n";
}
Compiler: g++ 4.9.1
Options: -std=c++11 -O2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With