Efficiently represent 16777217 as a float

Question

Browsing job advertisements, I saw the following question:

Do you understand what it takes to efficiently represent 16,777,217 as a float? [Siemens]

I don't understand the question. I know that 16,777,216 = 2²⁴ is the last integer in the sequence [0, 1, ..., N] that can be represented as a float exactly. How can 16,777,217 be represented as a float "efficiently"? 16,777,216 + 1? But that's two floats.

Please give a hint.

njuffa · Accepted Answer

Based on fairly extensive interview experience, I think this a question designed to initiate an open-ended technical discussion that draws on a candidate's breadth of knowledge and experience, creativity and capability to think "outside the box", and the willingness to ask questions to establish applicable context restrictions and trade-offs. In particular, the words "efficient", "represent", and "float" are all subject to interpretation.

What exactly does "float" mean? If one assumes that "float" refers to a data object of type float in C whose internal representation is the IEEE-754 binary32 format, then the answer is that it is not possible to store 16,777,217 without loss of accuracy, as the closest available encodings are 0x1.000000p+24 (16,777,216) and 0x1.000002p+24 (16,777,218).

However, the ISO-C standard does not mandate use of IEEE-754, and we might be looking at a hypothetical C implementation on ancient 36-bit machines such as the IBM 7090 or the Honeywell 6000 series, where float is mapped to the hardware's 36-bit single-precision format with 27 significand bits (most significant bit stored explicitly). Or in a more modern context, one might be working with an FPGA-based processor that uses a 40-bit floating-point format with a 31-bit stored significand for float. In either of these scenarios, the value 16,777,217 can be stored accurately in a float.

What exactly does "represent" mean? One might be working in an environment where for some reason all data is stored in floats comprising 32 bits. Using a 32-bit two's complement representation, the bit pattern corresponding to 16,777,217 is 0x01000001 and this bit pattern can be transferred to a float with memcpy(), and re-interpreted later into an integer on-the-fly at the point of use.

What exactly does "efficient" mean? For example, a given use case might only require positive floating-point operands, and one could therefore reuse the sign bit of binary32 operands to store an additional least significant significand bit, allowing lossless storage of the value 16,777,217. At the point of use, the sign bit would be used to expand the operand on-the-fly into a mid-section significand bit in a double or the tail element of a float-pair representation before applying the absolute value. For use cases that are largely bound by memory access, thus preferring the use of narrow storage types, this might be considered efficient despite the increased cost of computation due to the cost of executing additional instructions for unpacking and computation.

Other variants of compression are possible. For example, repeated leading significand bits can be compressed out for storage purposes if all float operands are close to powers of two. I actually used something like this for a processor extension with certain specialized instructions, where one operand was known to be always close to unity, allowing floating-point operands with 32 significand bits to be transported between these instructions using the IEEE-754 binary32 format. In that case compression and decompression was performed in hardware, so very efficient.

Efficiently represent 16777217 as a float

Tags:

floating-point

Evg

1 Answers

njuffa

Recent Activity

Donate For Us

Efficiently represent 16777217 as a float

Tags:

floating-point

Evg

1 Answers

njuffa

Related questions

Recent Activity

Donate For Us