Python numpy float16 datatype operations, and float8?

Tags:

when performing math operations on float16 Numpy numbers, the result is also in float16 type number. My question is how exactly the result is computed? Say Im multiplying/adding two float16 numbers, does python generate the result in float32 and then truncate/round the result to float16? Or does the calculation performed in '16bit multiplexer/adder hardware' all the way?

another question - is there a float8 type? I couldnt find this one... if not, then why? Thank-you all!

965

asked Aug 16 '16 13:08

JonyK

2 Answers

To the first question: there's no hardware support for float16 on a typical processor (at least outside the GPU). NumPy does exactly what you suggest: convert the float16 operands to float32, perform the scalar operation on the float32 values, then round the float32 result back to float16. It can be proved that the results are still correctly-rounded: the precision of float32 is large enough (relative to that of float16) that double rounding isn't an issue here, at least for the four basic arithmetic operations and square root.

In the current NumPy source, this is what the definition of the four basic arithmetic operations looks like for float16 scalar operations.

#define half_ctype_add(a, b, outp) *(outp) = \
        npy_float_to_half(npy_half_to_float(a) + npy_half_to_float(b))
#define half_ctype_subtract(a, b, outp) *(outp) = \
        npy_float_to_half(npy_half_to_float(a) - npy_half_to_float(b))
#define half_ctype_multiply(a, b, outp) *(outp) = \
        npy_float_to_half(npy_half_to_float(a) * npy_half_to_float(b))
#define half_ctype_divide(a, b, outp) *(outp) = \
        npy_float_to_half(npy_half_to_float(a) / npy_half_to_float(b))

The code above is taken from scalarmath.c.src in the NumPy source. You can also take a look at loops.c.src for the corresponding code for array ufuncs. The supporting npy_half_to_float and npy_float_to_half functions are defined in halffloat.c, along with various other support functions for the float16 type.

For the second question: no, there's no float8 type in NumPy. float16 is a standardized type (described in the IEEE 754 standard), that's already in wide use in some contexts (notably GPUs). There's no IEEE 754 float8 type, and there doesn't appear to be an obvious candidate for a "standard" float8 type. I'd also guess that there just hasn't been that much demand for float8 support in NumPy.

153

answered Oct 05 '22 21:10

Mark Dickinson

This answer builds on the float8 aspect of the question. The accepted answer covers the rest pretty well.One of the main reasons there isn't a widely accepted float8 type, other than a lack of standard is that its not very useful practically.

Primer on Floating Point

In standard notation, a float[n] data type is stored using n bits in memory. That means that at most only 2^n unique values can be represented. In IEEE 754, a handful of these possible values, like nan, aren't even numbers as such. That means all floating point representations (even if you go float256) have gaps in the set of rational numbers that they are able to represent and they round to the nearest value if you try to get a representation for a number in this gap. Generally the higher the n, the smaller these gaps are.

You can see the gap in action if you use the struct package to get the binary representation of some float32 numbers. Its a bit startling to run into at first but there's a gap of 32 just in the integer space:

import struct

billion_as_float32 = struct.pack('f', 1000000000 + i)
for i in range(32):
    billion_as_float32 == struct.pack('f', 1000000001 + i) // True

Generally, floating point is best at tracking only the most significant bits so that if your numbers have the same scale, the important differences are preserved. Floating point standards generally differ only in the way they distribute the available bits between a base and an exponent. For instance, IEEE 754 float32 uses 24 bits for the base and 8 bits for the exponent.

Back to `float8`

By the above logic, a float8 value can only ever take on 256 distinct values, no matter how clever you are in splitting the bits between base and exponent. Unless you're keen on it rounding numbers to one of 256 arbitrary numbers clustered near zero its probably more efficient to just track the 256 possibilities in a int8.

For instance, if you wanted to track a very small range with coarse precision you could divide the range you wanted into 256 points and then store which of the 256 points your number was closest to. If you wanted to get really fancy you could have a non-linear distribution of values either clustered at the centre or at the edges depending on what mattered most to you.

The likelihood of anyone else (or even yourself later on) needing this exact scheme is extremely small and most of the time the extra byte or 3 you pay as a penalty for using float16 or float32 instead is too small to make a meaningful difference. Hence...almost no one bothers to write up a float8 implementation.

answered Oct 05 '22 21:10

jeteon

Related questions
                            
                                Alamofire download issue
                            
                                Julia - how does @inline work? When to use function vs. macro?
                            
                                error TS2314: Generic type 'Promise<T>' requires 1 type argument(s)
                            
                                Angular 2 + ng test: 'X' is not a known component [duplicate]
                            
                                In Swift, does resetting the property inside didSet trigger another didSet?
                            
                                ASP.NET Core Identity Add custom user roles on application startup
                            
                                Spring validation for list of accepted string values
                            
                                When should you use swap or reset
                            
                                Python3 write gzip file - memoryview: a bytes-like object is required, not 'str'
                            
                                How to check nvidia-docker version?
                            
                                Tkinter Canvas creating rectangle
                            
                                How to make chromedriver undetectable

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Python numpy float16 datatype operations, and float8?

Tags:

python

floating-point

precision

numpy

JonyK

People also ask

2 Answers

Mark Dickinson

Primer on Floating Point

Back to `float8`

jeteon

Recent Activity

Donate For Us

Python numpy float16 datatype operations, and float8?

Tags:

python

floating-point

precision

numpy

JonyK

People also ask

2 Answers

Mark Dickinson

Primer on Floating Point

Back to float8

jeteon

Related questions

Recent Activity

Donate For Us

Back to `float8`