Half-precision: Difference between __float2half vs __float2half_rn

Question

There seems to be no documentation about these two functions.

What is the difference between __float2half and __float2half_rn?

havogt · Accepted Answer

It seems the CUDA documentation is indeed a bit inadequate here.

The function unsigned short __float2half_rn(float) in combination with float __half2float(unsigned short x) was already present in CUDA before the new half datatype was introduced in CUDA 7.5. It is defined in device_functions.h. The comment there reads:

Convert the single-precision float value x to a half-precision floating point value represented in unsigned short format, in round-to-nearest-even mode.

The function half __float2half(float) is defined in cuda_fp16.h and does apparently the same, but returns a half:

Converts float number a to half precision in round-to-nearest mode.

However, since half is a typedef to unsigned short, I checked if they do the same, with the following code:

#include <stdio.h>
#include "cuda_fp16.h"
#include "device_functions.h"
__global__ void test()
{
//  auto test = __float2half( 1.4232 );
    auto test = __float2half_rn( 1.4232 );
    printf( "%hu
", test );
}

int main()
{
    test<<<1,1>>>();
    cudaDeviceSynchronize();
}

I found that (for sm_20) the old __float2half_rn() has an additional int16 to int32 operation and does a 32bit store. On the other hand, __float2half_() does not have this conversion and does a 16bit store.

Relevant SASS code for __float2half_rn():

/*0040*/         I2I.U32.U16 R0, R0;
/*0050*/         STL [R2], R0;

For __float2half():

/*0048*/         STL.U16 [R2], R0;

Half-precision: Difference between float2half vs float2half_rn

Tags:

cuda

user2970139

1 Answers

havogt

Recent Activity

Donate For Us