round much slower than floor/ceil/int in LLVM

Question

I was benchmarking some essential routines by executing cycles such as:

float *src, *dst;
for (int i=0; i<cnt; i++) dst[i] = round(src[i]);

All with AVX2 target, newest CLANG. Interestingly floor(x), ceil(x), int(x)... all seem fast. But round(x) seems exremely slow and looking into disassembly there's some weird spaghetti code instead of the newer SSE or AVX versions. Even when blocking the ability to vectorize the loops by introducing some dependency, round is like 10x slower. For floor etc. the generated code uses vroundss, for round there's the spaghetti code... Any ideas?

Edit: I'm using -ffast-math, -mfpmath=sse, -fno-math-errno, -O3, -std=c++17, -march=core-avx2 -mavx2 -mfma

Chris Dodd · Accepted Answer

The problem is that none of the SSE rounding modes specify the correct rounding for round:

These functions round x to the nearest integer, but round halfway cases away from zero (regardless of the current rounding direction, see fenv(3)), instead of to the nearest even integer like rint(3).

If you want faster code, you could try testing rint instead of round, as that specifies a rounding mode that SSE does support.

round much slower than floor/ceil/int in LLVM

Tags:

c++

rounding

floor

clang++

Vojtěch Melda Meluzín

1 Answers

Chris Dodd

Recent Activity

Donate For Us

round much slower than floor/ceil/int in LLVM

Tags:

c++

rounding

floor

clang++

Vojtěch Melda Meluzín

1 Answers

Chris Dodd

Related questions

Recent Activity

Donate For Us