I'm doing my PhD research in A.I. and I've gotten to the part where I have to start using CUDA libraries for my testing platform. I've played with CUDA before, and I have a basic understanding of how GPGPU works, etc, but I am troubled by the float precision.
Looking at GTX680 I see FP64: 1/24 FP32, whereas Tesla has full FP64 at 1.31 TFLOPS. I understand very well that one is a gaming card, while the other is a professional card.
The reason I am asking is simple: I cannot afford a Tesla, but I may be able to get two GTX680. While the main target is to have as many CUDA cores and memory, float precision may become a problem.
My questions are:
Thanks !
These are very subjective questions.
It's not entirely clear that you understand the difference between C or C++ float
and double
datatypes. FP32 vs. FP64 refers to float
and double
in C or C++. The numbers of 1/8 and 1/24 that you refer to are not affecting precision but they are affecting throughput. All of the GPUs you mention have some FP64 double-precision capability, so the differences don't come down to capability so much as performance.
It's very important for you to understand whether the codes you care about depend on double-precision floating point or not. It's not enough to say things like "matrix operations" to understand whether FP32 (float
) or FP64 (double
) matters.
If your codes depend on FP64 double
, then those performance ratios (1/8, 1/24, etc.) will be relevant. But your codes should still run, perhaps more slowly.
You're also using some terms in a fashion that may lead to confusion. Tesla refers to the NVIDIA GPGPU family of compute products. It would be better to refer to a specific member of the Tesla family. Since you mention 1.31 TFlops FP, you are referring to Tesla K20X. Note that K20X also has a ratio between FP64 throughput and FP32 throughput (i.e. it can be even faster than 1.31 TFlops on FP32 codes).
If your algorithms depend on double
they will still run on any of the products you mention, and the accuracy of the results should be the same regardless of the product, however the performance will be lower, depnding on the product. If your algorithms depend on float
, they will run faster on any given product than double
, assuming floating point throughput is the limiting factor.
You may also want to consider the GeForce GTX Titan. It has double-precision floating point performance that is roughly on par with Tesla K20/K20x.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With