Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Division of floating point numbers on GPU different from that on CPU

Tags:

cuda

gpu

When I divide two floating point numbers on the GPU, the result is 0.196405. When I divide them on CPU, the result is 0.196404. The actual value using the calculator is 0.196404675. How do I make the division on the GPU and the CPU same?

like image 904
Programmer Avatar asked Dec 18 '12 16:12

Programmer


People also ask

What is Cuda library?

NVIDIA® CUDA-X, built on top of NVIDIA CUDA®, is a collection of libraries, tools, and technologies that deliver dramatically higher performance—compared to CPU-only alternatives— across multiple application domains, from artificial intelligence (AI) to high performance computing (HPC).


3 Answers

As the comments to another answer suggest, there are many reasons why it is not realistic to expect the same results from floating point computations run on the CPU and GPU. It's much stronger than that: you can't assume that FP results will be the same when the same source code is compiled against a different target architecture (e.g. x86 or x64) or with different optimization levels, either.

In fact, if your code is multithreaded and the FP operations are performed in different orders from one run to the next, then the EXACT SAME EXECUTABLE running on the EXACT SAME SYSTEM may produce slightly different results from one run to the next.

Some of the reasons include, but are not limited to:

  • floating point operations are not associative, so seemingly-benign reorderings (such as the race conditions from multithreading mentioned above) can change results;
  • different architectures support different levels of precision and rounding under different conditions (i.e. compiler flags, control word versus per instruction);
  • different compilers interpret the language standards differently, and
  • some architectures support FMAD (fused multiply-add) and some do not.

Note that for purposes of this discussion, the JIT compilers for CUDA (the magic that enables PTX code to be future-proof to GPU architectures that are not yet available) certainly should be expected to perturb FP results.

You have to write FP code that is robust despite the foregoing.

As I write this today, I believe that CUDA GPUs have a much better-designed architecture for floating point arithmetic than any contemporary CPU. GPUs include native IEEE standard (c. 2008) support for 16-bit floats and FMAD, have full-speed support for denormals, and enable rounding control on a per-instruction basis rather than control words whose settings have side effects on all FP instructions and are expensive to change.

In contrast, CPUs have an excess of per-thread state and poor performance except when using SIMD instructions, which mainstream compilers are terrible at exploiting for performance (since vectorizing scalar C code to take advantage of such instruction sets is much more difficult than building a compiler for a pseudo-scalar architecture such as CUDA). And if the wikipedia History page is to be believed, Intel and AMD appear to have completely botched the addition of FMAD support in a way that defies description.

You can find an excellent discussion of floating point precision and IEEE support in NVIDIA GPUs here:

https://developer.nvidia.com/content/precision-performance-floating-point-and-ieee-754-compliance-nvidia-gpus

like image 143
ArchaeaSoftware Avatar answered Oct 23 '22 09:10

ArchaeaSoftware


You don't. You should never assume that floating point values will be exactly equal to what you expect after mathematical operations. They are only defined to be correct to a specified precision and will vary slightly from processor to processor, regardless of whether that processor is a CPU or a GPU. An x86 processor, for instance, will actually do floating point computations with 80 bits of precision by default and will then truncate the result to the requested precision. Equivalence comparisons for floating point numbers should always use a tolerance, since no guarantee can be made that any two processors (or even the same processor through different but mathematically equivalent sequences of instructions) will produce the same result. E.g. floating-point numbers a and b should be considered equal if and only if | a - b | < t for some tolerance t.

like image 1
reirab Avatar answered Oct 23 '22 07:10

reirab


Which GPU is used for computation ?

Normally there will be a precision error of +1/-1 in the sixth place of the mantissa part if you are using the single precision floating point operation. this is because of the rounding off error in the GPU.

if you are using the double precision, you will get the exact precision what you are getting in the CPU. but the speed will be almost half that of floating point precision and memory usage will be 2 times. Now from FERMI based architecture onwards NVIDIA GPUs are supporting the double precision point computation support.

like image 1
Sijo Avatar answered Oct 23 '22 07:10

Sijo