Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

C++ Adding 2 arrays together quickly

Given the arrays:

int canvas[10][10];
int addon[10][10];

Where all the values range from 0 - 100, what is the fastest way in C++ to add those two arrays so each cell in canvas equals itself plus the corresponding cell value in addon?

IE, I want to achieve something like:

canvas += another;

So if canvas[0][0] =3 and addon[0][0] = 2 then canvas[0][0] = 5

Speed is essential here as I am writing a very simple program to brute force a knapsack type problem and there will be tens of millions of combinations.

And as a small extra question (thanks if you can help!) what would be the fastest way of checking if any of the values in canvas exceed 100? Loops are slow!

like image 500
Tom Gullen Avatar asked Jun 02 '10 16:06

Tom Gullen


People also ask

Can you add arrays together in C?

To concate two arrays, we need at least three array variables. We shall take two arrays and then based on some constraint, will copy their content into one single array. Here in this example, we shall take two arrays one will hold even values and another will hold odd values and we shall concate to get one array.

How do I combine two arrays?

To merge elements from one array to another, we must first iterate(loop) through all the array elements. In the loop, we will retrieve each element from an array and insert(using the array push() method) to another array. Now, we can call the merge() function and pass two arrays as the arguments for merging.

Can two arrays be added?

No, converting arrays into numbers and adding them is not possible as the length of arrays can be very lengthy.

How do you add to an array in C?

First get the element to be inserted, say x. Then get the position at which this element is to be inserted, say pos. Then shift the array elements from this position to one position forward(towards right), and do this for all the other elements next to pos.


2 Answers

Here is an SSE4 implementation that should perform pretty well on Nehalem (Core i7):

#include <limits.h>
#include <emmintrin.h>
#include <smmintrin.h>

static inline int canvas_add(int canvas[10][10], int addon[10][10])
{
    __m128i * cp = (__m128i *)&canvas[0][0];
    const __m128i * ap = (__m128i *)&addon[0][0];
    const __m128i vlimit = _mm_set1_epi32(100);
    __m128i vmax = _mm_set1_epi32(INT_MIN);
    __m128i vcmp;
    int cmp;
    int i;

    for (i = 0; i < 10 * 10; i += 4)
    {
        __m128i vc = _mm_loadu_si128(cp);
        __m128i va = _mm_loadu_si128(ap);

        vc = _mm_add_epi32(vc, va);
        vmax = _mm_max_epi32(vmax, vc);   // SSE4 *

        _mm_storeu_si128(cp, vc);

        cp++;
        ap++;
    }
    vcmp = _mm_cmpgt_epi32(vmax, vlimit); // SSE4 *
    cmp = _mm_testz_si128(vcmp, vcmp);    // SSE4 *
    return cmp == 0;
}

Compile with gcc -msse4.1 ... or equivalent for your particular development environment.

For older CPUs without SSE4 (and with much more expensive misaligned loads/stores) you'll need to (a) use a suitable combination of SSE2/SSE3 intrinsics to replace the SSE4 operations (marked with an * above) and ideally (b) make sure your data is 16-byte aligned and use aligned loads/stores (_mm_load_si128/_mm_store_si128) in place of _mm_loadu_si128/_mm_storeu_si128.

like image 56
Paul R Avatar answered Oct 06 '22 00:10

Paul R


You can't do anything faster than loops in just C++. You would need to use some platform specific vector instructions. That is, you would need to go down to the assembly language level. However, there are some C++ libraries that try to do this for you, so you can write at a high level and have the library take care of doing the low level SIMD work that is appropriate for whatever architecture you are targetting with your compiler.

MacSTL is a library that you might want to look at. It was originally a Macintosh specific library, but it is cross platform now. See their home page for more info.

like image 40
A. Levy Avatar answered Oct 06 '22 01:10

A. Levy