I need a function of an __m128i variable that has period 2^128. It doesn't need to monotonically increase (like a counter), but visit each value once.
The simplest example I could think of is in fact a 128-bit counter, but I found that difficult to implement in SSE. Are there any simpler/faster solutions?
Here's a monotonic counter. I'm not sure if you can call it simple though.
Assuming both ONE
and ZERO
are always in registers, then this should compile to 5 instructions. (7 or 8 if VEX-encoding is not used)
inline __m128i nextc(__m128i x){
const __m128i ONE = _mm_setr_epi32(1,0,0,0);
const __m128i ZERO = _mm_setzero_si128();
x = _mm_add_epi64(x,ONE);
__m128i t = _mm_cmpeq_epi64(x,ZERO);
t = _mm_and_si128(t,ONE);
t = _mm_unpacklo_epi64(ZERO,t);
x = _mm_add_epi64(x,t);
return x;
}
Test Code (MSVC):
int main() {
__m128i x = _mm_setr_epi32(0xfffffffa,0xffffffff,1,0);
int c = 0;
while (c++ < 10){
cout << x.m128i_u64[0] << " " << x.m128i_u64[1] << endl;
x = nextc(x);
}
return 0;
}
Output:
18446744073709551610 1
18446744073709551611 1
18446744073709551612 1
18446744073709551613 1
18446744073709551614 1
18446744073709551615 1
0 2
1 2
2 2
3 2
Slightly better version suggested by @Norbert P. It saves 1 instruction over my original solution.
inline __m128i nextc(__m128i x){
const __m128i ONE = _mm_setr_epi32(1,0,0,0);
const __m128i ZERO = _mm_setzero_si128();
x = _mm_add_epi64(x,ONE);
__m128i t = _mm_cmpeq_epi64(x,ZERO);
t = _mm_unpacklo_epi64(ZERO,t);
x = _mm_sub_epi64(x,t);
return x;
}
Never forget the KISS principle.
Pasting this (unsigned integers are required to wrap around by the C standard, hence visiting each value only once):
__uint128_t inc(__uint128_t x) {
return x+1;
}
into this yields (for x64):
addq $1, %rdi
adcq $0, %rsi
movq %rdi, %rax
movq %rsi, %rdx
ret
easy/fast enough? If you inline that, you'll probably be able to get away with just the addq
/adcq
(the movq
s and ret
are required by the x64 ABI: if you inline the function, they are not required)
To address Voo's comment about the suckiness of MSVC, you can use the following:
inline void inc(unsigned long long *x, unsigned long long *y) {
if (!++*x) ++*y; // yay for obfuscation!
}
I don't have a MSVC install nearby so I can't test it, but it should yield something similar to what I posted above. Then, if you really need a __m128i, you should be able to cast the two halves.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With