Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C++ calculate and sort vector at compile time

I have a class A that has a std::vector<int> as attribute. A needs to fill this vector when an instance of A is created. The calculation can take some time and I would like to know if:

  1. it can be done at compile time.
  2. the vector can be sorted at compile time as well

I am not familiar with metaprogramming and I didn't find a way to do it for now. It's not an OS-specific question.

Here is the A.cpp file :

#include "A.h"
#define SIZEV 100

A::A()
{
    fillVector();
}

void A::fillVector()
{
    // m_vector is an attribute of class "A"
    // EXPECTATION 1 : fill the vector with the following calculation at compile time

    const int a=5;
    const int b=7;
    const int c=9;

    for(int i=0;i<SIZEV;i++){
        for(int j=0;j<SIZEV;j++){
            for(int k=0;k<SIZEV;k++){
                this->m_vector.push_back(a*i+b*j+c*k);
            }
        }
    }

    // EXPECTATION 2 : sort the vector as compile time 
}

std::vector<int> A::getVector() const
{
    return m_vector;
}

void A::setVector(const std::vector<int> &vector)
{
    m_vector = vector;
}

and the main.cpp (Qt app but doesn't matter):

#include <QCoreApplication>
#include "A.h"

int main(int argc, char *argv[])
{
    QCoreApplication app(argc, argv);

    A a;
    auto vec = a.getVector();

    // use vec, etc ...

    return app.exec();
}
like image 750
BlackPopa Avatar asked Sep 18 '15 20:09

BlackPopa


4 Answers

This is a simple compile time sort for ints. It works by for each element, working out its position within the list. From this it works out what should be at each position. Then it builds a new list inserted at the appropriate positions. It's probably not as efficient in terms of complexity (it's O(n^2)) as the previous solution but it's much easier to understand and it doesn't use recursion.

#include <initializer_list>
#include <array>
#include <tuple>

template<int... members>
struct IntList
{
    constexpr bool operator==(IntList) const { return true; }

    template<int... others>
    constexpr bool operator==(IntList<others...>) const { return false; }

    template<int idx>
    static constexpr auto at() 
    {
        return std::get<idx>(std::make_tuple(members...));
    }

    template<int x>
    static constexpr auto indexOf()
    {
        int sum {};
        auto _ = { 0, (x > members ? ++sum : 0)... };
        return sum;
    }

    template<int x>
    static constexpr auto count()
    {
        int sum {};
        auto _ = { 0, (x == members ? ++sum : 0)... };
        return sum;
    }

    template<int i>
    static constexpr auto ith()
    {
        int result{};
        auto _ = {
            ( i >= indexOf<members>() && i < indexOf<members>() + count<members>() ? 
              result = members : 0 )...
        };
        return result;
    }

    template<std::size_t... i>
    static constexpr auto sortImpl(std::index_sequence<i...>)
    {
        return IntList< ith<i>()... >();
    }

    static constexpr auto sort() 
    {
        return sortImpl(std::make_index_sequence<sizeof...(members)>());
    }
};

static_assert(IntList<1, 2, 3>().at<1>() == 2, "");

static_assert(IntList<>().indexOf<1>()           == 0, "");
static_assert(IntList<1>().indexOf<1>()          == 0, "");
static_assert(IntList<1, 2, 3, 4>().indexOf<3>() == 2, "");

static_assert(IntList<>().count<1>()        == 0, "");
static_assert(IntList<1>().count<1>()       == 1, "");
static_assert(IntList<1, 1>().count<1>()    == 2, "");
static_assert(IntList<2, 2, 1>().count<1>() == 1, "");
static_assert(IntList<1, 2, 1>().count<1>() == 2, "");

static_assert(IntList<>().sort()        == IntList<>(),        "");
static_assert(IntList<1>().sort()       == IntList<1>(),       "");
static_assert(IntList<1, 2>().sort()    == IntList<1, 2>(),    "");
static_assert(IntList<2, 1>().sort()    == IntList<1, 2>(),    "");
static_assert(IntList<3, 2, 1>().sort() == IntList<1, 2, 3>(), "");
static_assert(IntList<2, 2, 1>().sort() == IntList<1, 2, 2>(), "");
static_assert(IntList<4, 7, 2, 5, 1>().sort() == IntList<1, 2, 4, 5, 7>(), "");
static_assert(IntList<4, 7, 7, 5, 1, 1>().sort() == IntList<1, 1, 4, 5, 7, 7>(), "");
like image 184
Ab Wilson Avatar answered Oct 25 '22 12:10

Ab Wilson


The data is integers from 0 to SIZEV * (a+b+c), but the number of integers is SIZEV3. It's a dense group of integers with a small range, so CountingSort is perfect (and you never need to build the unsorted array, just increment counts while generating).

Regardless of keeping around the counts / prefix sums, CountingSort is absolutely going to be a big win in startup time to sort the vector, vs. other sorts, keeping everything else the same.

You can keep a compact form (O(cuberoot(n)) size) of your data as a vector of prefix sums, for lookups from m_vector in O(log (cuberoot(n))) time (binary search the prefix sums), where n is the length of m_vector. See below.

Depending on cache / memory latency, not actually expanding m_vector might or might not be a performance win. If a range of values is needed, you can very quickly generate sequential elements of m_vector on the fly, from the prefix sums.

class A {
    // vector<uint16_t> m_counts;  // needs to be 32b for SIZEV>=794 (found experimentally).

    vector<uint32_t> m_pos;     // values are huge: indices into m_vector, up to SIZEV**3 - 1
    vector<uint16_t> m_vector;  // can be 16b until SIZEV>3121: max val is only (a+b+c) * (SIZEV-1)
}
void A::fillVector()
{
    const int a=5;
    const int b=7;
    const int c=9;

    const auto max_val = (SIZEV-1) * (a+b+c);

    m_vector.reserve(SIZEV*SIZEV*SIZEV);
    m_vector.resize(0);
    // or clear it, but that writes tons of mem, unless you use a custom Allocator::construct to leave it uninit
    // http://en.cppreference.com/w/cpp/container/vector/resize

    m_pos.resize(max_val + 1);  // again, ideally avoid zeroing
                  // but if not, do it before m_counts

    m_counts.clear();  // do this one last, so it's hot in cache even if others wasted time writing zeros.
    m_counts.resize(max_val + 1); // vector is now zeroed
    // Optimization: don't have a separate m_counts.
    // zero and count into m_pos, then do prefix summing in-place


    // manually strength-reduce the multiplication to addition
    // in case the compiler decides it won't, or can't prove it won't overflow the same way
    // Not necessary with gcc or clang: they both do this already
    for(int kc=c*(SIZEV-1) ; kc >= 0 ; kc-=c) {
      for(int jb=b*(SIZEV-1) ; jb >= 0 ; jb-=b) {
        for(int ia=a*(SIZEV-1) ; ia >= 0 ; ia-=a) {
          m_counts[kc + jb + ia]++;
          // do the smallest stride in the inner-most loop, for better cache locality
        }
      }
    }
// write the early elements last, so they'll be hot in the cache when we're done


    int val = 0;
    uint32_t sum = 0;
    for ( auto &count : m_counts ) {
       m_vector.insert(m_vector.end(), count, val++);
       // count is allowed to be zero for vector::insert(pos, count, value)
       m_pos[val] = sum;   // build our vector of prefix sums
       sum += count;

       //count = (sum+=count);  // in-place conversion to prefix sums
    }
    assert(m_vector.size() == SIZEV*SIZEV*SIZEV);
}

Or, instead of actually expanding a 1.6GB array, make Prefix sums of the counts, giving you a vector of the start position of the run of that index as an element in m_vector. i.e. idx = m_pos[val]; m_vector[idx] == val. (This breaks down for val <= 13, where there are values that can't be represented as a sum of a, b, and c, so there are zeros in m_count, and repeats in m_pos)

Anyway, you can replace a read of m_vector[i] with a binary-search for i in m_pos. You're looking for the highest index in m_pos that has value <= i. That index is what you'd find at m_vector[i]. (Or something like that; I may have an off-by-one error.)

A hash table won't work, because you need to map multiple values of i to each number from 0..(750*(a+b+c)). (All the is where m_vector[i] has the same value.)

If you need a run of sequential elements, generate them on the fly into a tmp buffer. Look at m_pos[i+1] to see when the next element with a different value is coming. (Looking at m_counts might save some subtraction, but you're probably better off just taking differences in m_pos to invert the prefix sums, to avoid cache misses / cache pollution from touching a 2nd array.)

Actually, m_counts probably doesn't need to be kept around as a class member at all, just a temporary in FillVector. Or FillVector can count into m_pos, and convert it in-place into prefix sums.

Ideally there's something clever you can do with templates to choose a types that are wide enough, but no wider than needed, for m_counts and m_vector. IDK number theory, so I don't know how to prove that there won't be one bucket of m_counts that overflows a uint16_t. The average count will be 750**3 / (750*(5+7+9)) = 26786, and they're certainly clustered towards the high end of m_counts. In practice, SIZEV=793 can use uint16_t counters, while SIZEV=794 produces several counts > 65536 (Thanks to Chris for the working example where I could easily test this).

m_vector can be uint16_t until (SIZEV-1)*(a+b+c) > MAX_UINT16 (65535). i.e. until SIZEV >= 3122, at which point m_vector takes 28.3 GiB of RAM.


At SIZEV = 750, m_pos is about 2x L1 cache size (Intel CPU) (750*(5+7+9) * 4B per short = 63000B). If the compiler does a good job and makes a binary search with conditional-move instead of unpredictable branch instructions, this could be quite fast. It will certainly save you a lot of main-memory traffic, which is valuable if you have multiple threads.

Alternatively, never touching m_vector means you can handle problem sizes which would require more memory than you have to store the list.

If you want to get really creative with optimizing for cache when creating m_counts in the first place (with the triple-nested loop), have the innermost loop go forwards and then back, instead of the same direction both times. This will only matter for extremely large SIZEV, or if the other hyperthread is putting a lot of pressure on the cache.

  for(int kc=c*(SIZEV-1) ; kc >= 0 ; kc-=c) {
    for(int jb=b*(SIZEV-1) ; jb >= 0 ; jb-=b) {

      for(int ia=0 ; ia<SIZEV*a ; ia+=a)
        counts[kc + jb + ia]++;
      if (! (jb-=b )) break;
      for(int ia=a*(SIZEV-1) ; ia >= 0 ; ia-=a)
        counts[kc + jb + ia]++;

    }
  }

Counting down towards zero (with or without the bidirectional inner loops) is very likely a small win for the beginning of the next loop, before it becomes memory-bound doing big memsets when counts get high. Also a win for scanning through forwards to do prefix sums in place.


my previous answer, which is probably a dead end:

Is there any hope of finding a closed-form formula for the ith element in the sorted vector? Or even an O(log i) algorithm for generating it on the fly?

Unless you need a lot of sequential elements from this vector when you access it, it might be faster to compute it on the fly. Memory is slow, CPU is fast, so if you can compute a[i] in under ~150 clock cycles, you come out ahead. (Assuming every access is a cache miss, or that not touching all that vector memory reduces cache misses in the rest of your program).

If we can do this, we could in theory write the sorted array in order in the first place.

To do that: shuffle the constants so a <= b <= c.

0, a, [a*2 .. a*int(b/a)], b, [b + a .. b + a*int((c-b)/a) mixed with b*2 .. b*int(c/b)], c, [some number of b*x + a*y], c+a, [more b*x + a*y], ...

Ok, so this is turning into a combinatorial mess, and this idea probably isn't viable. At least, not for the general case of any a, b, and c.

With a=5, b=7, c=9:

0, 5=a, 7=b, 9=c, 10=2a, 12=b+a, 14=2b, 14=c+a, 15=3a, 16=c+b, 18=2c

I'm too sleepy to see a pattern, but here's a longer list

# bash
limit=5; for ((i=0 ; i<limit ; i++)); do
             for ((j=0 ; j<limit ; j++)); do 
               for ((k=0 ; k<limit ; k++)); do 
                 printf "%2d: %d %d %d\n" $((5*i + 7*j + 9*k)) $i $j $k; 
           done; done; done | sort -n | cat -n
     1   0: 0 0 0
     2   5: 1 0 0
     3   7: 0 1 0
     4   9: 0 0 1
     5  10: 2 0 0
     6  12: 1 1 0
     7  14: 0 2 0
     8  14: 1 0 1
     9  15: 3 0 0
    10  16: 0 1 1
    11  17: 2 1 0
    12  18: 0 0 2
    13  19: 1 2 0
    14  19: 2 0 1
    15  20: 4 0 0
    16  21: 0 3 0
    17  21: 1 1 1
    18  22: 3 1 0
    19  23: 0 2 1
    20  23: 1 0 2
    21  24: 2 2 0
    22  24: 3 0 1
    23  25: 0 1 2
    24  26: 1 3 0
    25  26: 2 1 1
    26  27: 0 0 3
    27  27: 4 1 0
    28  28: 0 4 0
    29  28: 1 2 1
    30  28: 2 0 2
    31  29: 3 2 0
    32  29: 4 0 1
    33  30: 0 3 1
    34  30: 1 1 2
    35  31: 2 3 0
    36  31: 3 1 1
    37  32: 0 2 2
    38  32: 1 0 3
    39  33: 1 4 0
    40  33: 2 2 1
    41  33: 3 0 2
    42  34: 0 1 3
    43  34: 4 2 0
    44  35: 1 3 1
    45  35: 2 1 2
    46  36: 0 0 4
    47  36: 3 3 0
    48  36: 4 1 1
    49  37: 0 4 1
    50  37: 1 2 2
    51  37: 2 0 3
    52  38: 2 4 0
    53  38: 3 2 1
    54  38: 4 0 2
    55  39: 0 3 2
    56  39: 1 1 3
    57  40: 2 3 1
    58  40: 3 1 2
    59  41: 0 2 3
    60  41: 1 0 4
    61  41: 4 3 0
    62  42: 1 4 1
    63  42: 2 2 2
    64  42: 3 0 3
    65  43: 0 1 4
    66  43: 3 4 0
    67  43: 4 2 1
    68  44: 1 3 2
    69  44: 2 1 3
    70  45: 3 3 1
    71  45: 4 1 2
    72  46: 0 4 2
    73  46: 1 2 3
    74  46: 2 0 4
    75  47: 2 4 1
    76  47: 3 2 2
    77  47: 4 0 3
    78  48: 0 3 3
    79  48: 1 1 4
    80  48: 4 4 0
    81  49: 2 3 2
    82  49: 3 1 3
    83  50: 0 2 4
    84  50: 4 3 1
    85  51: 1 4 2
    86  51: 2 2 3
    87  51: 3 0 4
    88  52: 3 4 1
    89  52: 4 2 2
    90  53: 1 3 3
    91  53: 2 1 4
    92  54: 3 3 2
    93  54: 4 1 3
    94  55: 0 4 3
    95  55: 1 2 4
    96  56: 2 4 2
    97  56: 3 2 3
    98  56: 4 0 4
    99  57: 0 3 4
   100  57: 4 4 1
   101  58: 2 3 3
   102  58: 3 1 4
   103  59: 4 3 2
   104  60: 1 4 3
   105  60: 2 2 4
   106  61: 3 4 2
   107  61: 4 2 3
   108  62: 1 3 4
   109  63: 3 3 3
   110  63: 4 1 4
   111  64: 0 4 4
   112  65: 2 4 3
   113  65: 3 2 4
   114  66: 4 4 2
   115  67: 2 3 4
   116  68: 4 3 3
   117  69: 1 4 4
   118  70: 3 4 3
   119  70: 4 2 4
   120  72: 3 3 4
   121  74: 2 4 4
   122  75: 4 4 3
   123  77: 4 3 4
   124  79: 3 4 4
   125  84: 4 4 4
like image 31
Peter Cordes Avatar answered Oct 15 '22 02:10

Peter Cordes


A std::vector<int> does not have any constexpr constructors (because dynamic memory allocation is disallowed for constexpr). So you can't sort a std::vector<int> at compile-time.

You can create a std::array<int, N> at compile-time for a constant N, but you'd have to write your own sorting routine because std::sort is not constexpr either.

You can also write a Boost.MPL compile-time vector or list and use the sort routine of that. But this won't scale as well as std::array.

Another angle of attack might be to store the vector into a static variable and do the sorting at program initialization. Your program just takes a bit longer to start, but it won't affect the rest of its main functionality.

Since sorting is O(N log N), you might even have a two-step build and write the sorted vector to a file, and either compile/link it to your main program, or load it in O(N) at program startup into a static variable.

like image 16
TemplateRex Avatar answered Oct 25 '22 13:10

TemplateRex


The classical approach for lengthy calculations that can be precomputed is to calculate the result as a part of the build process, generating a .cpp that hardcodes the result (on platforms that have embedded resources these can be used as well). .

However, here the calculation is extremely simple, the slow part is probably just the allocation, which, if you want to keep the data in an std::vector, has to happen at runtime. If you can live with a C-style array you could put it all in the executable as described above, but that would produce an executable 4 MBs larger, and the slowdown caused to load it from disk would offset any speed benefit of the precalculation.

IOW: precalculating at build time makes sense when the computation is expensive and the output is small. Your case is at the exact opposite of the spectrum, so I would avoid it.

like image 5
Matteo Italia Avatar answered Oct 25 '22 12:10

Matteo Italia