nVidia GPUs for Research Purposes: Float Precision

Question

I'm doing my PhD research in A.I. and I've gotten to the part where I have to start using CUDA libraries for my testing platform. I've played with CUDA before, and I have a basic understanding of how GPGPU works, etc, but I am troubled by the float precision.

Looking at GTX680 I see FP64: 1/24 FP32, whereas Tesla has full FP64 at 1.31 TFLOPS. I understand very well that one is a gaming card, while the other is a professional card.

The reason I am asking is simple: I cannot afford a Tesla, but I may be able to get two GTX680. While the main target is to have as many CUDA cores and memory, float precision may become a problem.

My questions are:

How much of a compromise is the small float precision in Gaming GPU's?
Isn't 1/24 of a 32bit float precision too small? Especially compared to previous Fermi of 1/8 FP32
Is there a risk of wrong computation results due to the smaller float precision? I.e in SVM, VSM, Matrix operations, Deep Belief Networks, etc, could I have issues with the results of the algorithms due to the smaller floating point, or does it simply mean that operations will take longer/use more memory?

Thanks !

Robert Crovella · Accepted Answer

These are very subjective questions.

It's not entirely clear that you understand the difference between C or C++ float and double datatypes. FP32 vs. FP64 refers to float and double in C or C++. The numbers of 1/8 and 1/24 that you refer to are not affecting precision but they are affecting throughput. All of the GPUs you mention have some FP64 double-precision capability, so the differences don't come down to capability so much as performance.

It's very important for you to understand whether the codes you care about depend on double-precision floating point or not. It's not enough to say things like "matrix operations" to understand whether FP32 (float) or FP64 (double) matters.

If your codes depend on FP64 double, then those performance ratios (1/8, 1/24, etc.) will be relevant. But your codes should still run, perhaps more slowly.

You're also using some terms in a fashion that may lead to confusion. Tesla refers to the NVIDIA GPGPU family of compute products. It would be better to refer to a specific member of the Tesla family. Since you mention 1.31 TFlops FP, you are referring to Tesla K20X. Note that K20X also has a ratio between FP64 throughput and FP32 throughput (i.e. it can be even faster than 1.31 TFlops on FP32 codes).

If your algorithms depend on double they will still run on any of the products you mention, and the accuracy of the results should be the same regardless of the product, however the performance will be lower, depnding on the product. If your algorithms depend on float, they will run faster on any given product than double, assuming floating point throughput is the limiting factor.

You may also want to consider the GeForce GTX Titan. It has double-precision floating point performance that is roughly on par with Tesla K20/K20x.

nVidia GPUs for Research Purposes: Float Precision

Tags:

cuda

gpgpu

nvidia

floating-point-precision

Ælex

1 Answers

Robert Crovella

Recent Activity

Donate For Us

nVidia GPUs for Research Purposes: Float Precision

Tags:

cuda

gpgpu

nvidia

floating-point-precision

Ælex

1 Answers

Robert Crovella

Related questions

Recent Activity

Donate For Us