There seems to be no documentation about these two functions.
What is the difference between __float2half
and __float2half_rn
?
It seems the CUDA documentation is indeed a bit inadequate here.
The function unsigned short __float2half_rn(float)
in combination with float __half2float(unsigned short x)
was already present in CUDA before the new half
datatype was introduced in CUDA 7.5.
It is defined in device_functions.h
. The comment there reads:
Convert the single-precision float value x to a half-precision floating point value represented in unsigned short format, in round-to-nearest-even mode.
The function half __float2half(float)
is defined in cuda_fp16.h
and does apparently the same, but returns a half
:
Converts float number a to half precision in round-to-nearest mode.
However, since half
is a typedef to unsigned short
, I checked if they do the same, with the following code:
#include <stdio.h>
#include "cuda_fp16.h"
#include "device_functions.h"
__global__ void test()
{
// auto test = __float2half( 1.4232 );
auto test = __float2half_rn( 1.4232 );
printf( "%hu\n", test );
}
int main()
{
test<<<1,1>>>();
cudaDeviceSynchronize();
}
I found that (for sm_20
) the old __float2half_rn()
has an additional int16 to int32 operation and does a 32bit store. On the other hand, __float2half_()
does not have this conversion and does a 16bit store.
Relevant SASS code for __float2half_rn()
:
/*0040*/ I2I.U32.U16 R0, R0;
/*0050*/ STL [R2], R0;
For __float2half()
:
/*0048*/ STL.U16 [R2], R0;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With