Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to access SIMD vector elements when overloading array access operators?

I am trying to make some SIMD code that works on MSVC compile with Clang on Xcode 6. Unfortunately I get an error where the array access operators have been overloaded in a custom vector class that I am unable to fix. The vector template has specializations for arrays of length 4 and 8 which use SIMD intrinsics, but the array access operator to return a reference to an element of the vector (for updating that element) gives me an error on clang "non-const reference cannot bind to vector element".

Full source code

The overloaded operators:

#ifdef _MSC_VER
  float operator[](int idx) const { return v.m256_f32[idx]; } // m256_f32 MSVC only
  float& operator[](int idx) { return v.m256_f32[idx]; }
#else
  float operator[](int idx) const { return v[idx]; }
  float& operator[](int idx) { return v[idx]; }
#endif

The error from Clang:

non-const reference cannot bind to vector element
  float& operator[](int idx) { return v[idx]; }
                                      ^~~~~~
like image 846
olilarkin Avatar asked Oct 24 '14 19:10

olilarkin


2 Answers

I think you'll probably need to use a union for this, e.g.:

union U {
    __m256 v;
    float a[8];
};

and the value operator would then be:

float operator[](int idx) const { U u = { v }; return u.a[idx]; }

The reference operator is trickier though, and the only way I can see to do it is via type punning, so with the usual caveats:

float& operator[](int idx) { return ((float *)&v)[idx]; }

I'm not even sure this will compile, and you may need -fno-strict-aliasing.

To avoid this nastiness I suppose you could consider changing you member variable from __m256 v; to U u;.

I just hope you're not doing this kind of thing inside any performance-critical loops.

like image 141
Paul R Avatar answered Nov 06 '22 15:11

Paul R


This only works for reading, so you wouldn't be able to return a reference to float. But this should works with runtime values:

  • >= SSE4.1+: use PEXTRD intrinsics like _mm_extract_ps
  • before SSE4.1: _mm_cvtsi128_si32 + _mm_srli_si128 to access any 32-bit item (then case with a union)
like image 2
ponce Avatar answered Nov 06 '22 16:11

ponce