Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why would array<T, N> ever be slower than vector<T>?

Today I decided to benchmark and compare some differences in gcc optimizability of std::vector and std::array. Generally, I found what I expected: performing a task on each of a collection of short arrays is much faster than performing the tasks on a collection equivalent vectors.

However, I found something unexpected: using std::vector to store the collection of arrays is faster than using std::array. Just in case it was the result of some artifact of a large amount of data on the stack, I also tried allocating it as an array on the heap and in a C-style array on the heap (but the results still resemble an array of arrays on the stack and a vector of arrays).

Any idea why std::vector would ever outperform std::array (on which the compiler has more compile-time information)?

I compiled using gcc-4.7 -std=c++11 -O3 (gcc-4.6 -std=c++0x -O3 should also result in this conundrum). Runtimes were computed using the bash-native time command (user time).

Code:

#include <array>
#include <vector>
#include <iostream>
#include <assert.h>
#include <algorithm>

template <typename VEC>
double fast_sq_dist(const VEC & lhs, const VEC & rhs) {
  assert(lhs.size() == rhs.size());
  double result = 0.0;
  for (int k=0; k<lhs.size(); ++k) {
    double tmp = lhs[k] - rhs[k];
    result += tmp * tmp;
  }
  return result;
}

int main() {
  const std::size_t K = 20000;
  const std::size_t N = 4;

  // declare the data structure for the collection
  // (uncomment exactly one of these to time it)

  // array of arrays
  // runtime: 1.32s
  std::array<std::array<double, N>, K > mat;

  // array of arrays (allocated on the heap)
  // runtime: 1.33s
  //  std::array<std::array<double, N>, K > & mat = *new std::array<std::array<double, N>, K >;

  // C-style heap array of arrays
  // runtime: 0.93s
  //  std::array<double, N> * mat = new std::array<double, N>[K];

  // vector of arrays
  // runtime: 0.93
  //  std::vector<std::array<double, N> > mat(K);

  // vector of vectors
  // runtime: 2.16s
  //  std::vector<std::vector<double> > mat(K, std::vector<double>(N));

  // fill the collection with some arbitrary values
  for (std::size_t k=0; k<K; ++k) {
    for (std::size_t j=0; j<N; ++j)
      mat[k][j] = k*N+j;
  }

  std::cerr << "constructed" << std::endl;

  // compute the sum of all pairwise distances in the collection
  double tot = 0.0;
   for (std::size_t j=0; j<K; ++j) {
     for (std::size_t k=0; k<K; ++k)
       tot += fast_sq_dist(mat[j], mat[k]);
   }

   std::cout << tot << std::endl;

  return 0;
}

NB 1: All versions print the same result.

NB 2: And just to demonstrate that the runtime differences between std::array<std::array<double, N>, K>, std::vector<std::array<double, N> >, and std::vector<std::vector<double> > wasn't simply from assignment/initialization when allocating, the runtimes of simply allocating the collection (i.e. commenting out the computation and printing of tot) were 0.000s, 0.000s, and 0.004s, respectively.

NB 3: Each method is compiled and run separately (not timed back-to-back within the same executable), to prevent unfair differences in caching.

NB 4:
Assembly for array of arrays: http://ideone.com/SM8dB
Assembly for vector of arrays: http://ideone.com/vhpJv
Assembly for vector of vectors: http://ideone.com/RZTNE

NB 5: Just to be absolutely clear, I am in no way intending to criticize STL. A absolutely love STL and, not only do I use it frequently, details of effective use have taught me a lot of subtle and great features of C++. Instead, this is an intellectual pursuit: I was simply timing things to learn principles of efficient C++ design.

Furthermore, it would be unsound to blame STL, because it is difficult to deconvolve the etiology of the runtime differential: With optimizations turned on, it can be from compiler optimizations that slow the code rather than quicken it. With optimizations turned off, it can be from unnecessary copy operations (that would be optimized out and never be executed in production code), which can be biased against certain data types more than others.

If you are curious like me, I'd love your help figuring this out.

like image 752
user Avatar asked Jul 01 '12 01:07

user


People also ask

Why are vectors faster than arrays?

A std::vector can never be faster than an array, as it has (a pointer to the first element of) an array as one of its data members. But the difference in run-time speed is slim and absent in any non-trivial program. One reason for this myth to persist, are examples that compare raw arrays with mis-used std::vectors.

How much slower are vectors than arrays?

The time difference is 0.333 seconds. Or a difference of 0.000000000333 per array access. Assuming a 2.33 GHz Processor like mine that's 0.7 instruction pipeline stages per array accesses. So the vector looks like it is using one extra instruction per accesses.

Why would you use an array instead of a vector?

Arrays are slightly more compact: the size is implicit. Arrays are non-resizable; sometimes this is desirable. Arrays don't require parsing extra STL headers (compile time). It can be easier to interact with straight-C code with an array (e.g. if C is allocating and C++ is using).

Are arrays more efficient than vectors?

Array is memory efficient data structure. Vector takes more time in accessing elements. Array access elements in constant time irrespective of their location as elements are arranged in a contiguous memory allocation.


1 Answers

Consider the second and third tests. Conceptually, they are identical: Allocate K * N * sizeof(double) bytes off the heap and then access them in exactly the same way. So why the different times?

All of your "faster" tests have one thing in common: new[]. All of the slower tests are allocated with new or on the stack. vector probably uses new[] Under the Hood™. The only obvious cause for this is that new[] and new have more significantly different implementations than expected.

What I'm going to suggest is that new[] will fall back to mmap and allocate directly on a page boundary, giving you an alignment speedup, whereas the other two methods will not allocate on a page boundary.

Consider using an OS allocation function to directly map committed pages, and then place a std::array<std::array<double, N>, K> into it.

like image 162
Puppy Avatar answered Oct 12 '22 02:10

Puppy