Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Maximum numerical information density with printf

My use case is writing numbers to a JSON document in which size minimisation is more important than the precision of very small/large numbers. The numbers commonly represent common units such as milliseconds or metres, which tend to fall into the [0.001,1000] range.

Essentially I'd like to set a maximum character length. For example, if the limit were five characters, then:

from      to

1234567   123e4
12345.6   12346
1234.56   1235
123.456   123.5
12.3456   12.35
1.23456   1.235
1.23450   1.235
1.23400   1.234
1.23000   1.23
1.20000   1.2
1.00000   1
0.11111   0.111
0.01111   0.011
0.00111   0.001
0.00011   11e-4
0.00001   1e-5
0.11111   0.111
0.01111   0.011
0.00111   0.001
0.00011   11e-4
0.00001   1e-5

This test case seems to convey the most information within a length constraint.

It does fail with numbers raised to powers outside the range [-99,999], and that range will vary according to the imposed restriction. Perhaps the failure case here is just to write a longer string in these rare cases.

This is the ideal, though I could live without implementing it myself if another solution is relatively close, perhaps truncating instead of rounding, and not taking advantage of scientific/exponentiated notation.

EDIT here's what printf with %.3f, %.3g, %.4g produce by comparison (code here):

printf("%.3f");

match 0 - 1.23457e+06 -> 1234567.000 expected 12e5
match 0 - 12345.6     -> 12345.600   expected 12346
match 0 - 1234.56     -> 1234.560    expected 1235
match 0 - 123.456     -> 123.456     expected 123.5
match 0 - 12.3456     -> 12.346      expected 12.35
match 1 - 1.23456     -> 1.235
match 0 - 1.2345      -> 1.234       expected 1.235
match 1 - 1.234       -> 1.234
match 0 - 1.23        -> 1.230       expected 1.23
match 0 - 1.2         -> 1.200       expected 1.2
match 0 - 1           -> 1.000       expected 1
match 1 - 0.11111     -> 0.111
match 1 - 0.01111     -> 0.011
match 1 - 0.00111     -> 0.001
match 0 - 0.00011     -> 0.000       expected 11e-4
match 0 - 1e-05       -> 0.000       expected 1e-5
match 1 - 0.11111     -> 0.111
match 1 - 0.01111     -> 0.011
match 1 - 0.00111     -> 0.001
match 0 - 0.00011     -> 0.000       expected 11e-4
match 0 - 1e-05       -> 0.000       expected 1e-5

printf("%.3g");

match 0 - 1.23457e+06 -> 1.23e+06  expected 12e5
match 0 - 12345.6     -> 1.23e+04  expected 12346
match 0 - 1234.56     -> 1.23e+03  expected 1235
match 0 - 123.456     -> 123       expected 123.5
match 0 - 12.3456     -> 12.3      expected 12.35
match 0 - 1.23456     -> 1.23      expected 1.235
match 0 - 1.2345      -> 1.23      expected 1.235
match 0 - 1.234       -> 1.23      expected 1.234
match 1 - 1.23        -> 1.23
match 1 - 1.2         -> 1.2
match 1 - 1           -> 1
match 1 - 0.11111     -> 0.111
match 0 - 0.01111     -> 0.0111    expected 0.011
match 0 - 0.00111     -> 0.00111   expected 0.001
match 0 - 0.00011     -> 0.00011   expected 11e-4
match 0 - 1e-05       -> 1e-05     expected 1e-5
match 1 - 0.11111     -> 0.111
match 0 - 0.01111     -> 0.0111    expected 0.011
match 0 - 0.00111     -> 0.00111   expected 0.001
match 0 - 0.00011     -> 0.00011   expected 11e-4
match 0 - 1e-05       -> 1e-05     expected 1e-5

printf("%.4g");

match 0 -> 1.23457e+06 -> 1.235e+06 expected 12e5
match 0 -> 12345.6     -> 1.235e+04 expected 12346
match 1 -> 1234.56     -> 1235
match 1 -> 123.456     -> 123.5
match 1 -> 12.3456     -> 12.35
match 1 -> 1.23456     -> 1.235
match 0 -> 1.2345      -> 1.234     expected 1.235
match 1 -> 1.234       -> 1.234
match 1 -> 1.23        -> 1.23
match 1 -> 1.2         -> 1.2
match 1 -> 1           -> 1
match 0 -> 0.11111     -> 0.1111    expected 0.111
match 0 -> 0.01111     -> 0.01111   expected 0.011
match 0 -> 0.00111     -> 0.00111   expected 0.001
match 0 -> 0.00011     -> 0.00011   expected 11e-4
match 0 -> 1e-05       -> 1e-05     expected 1e-5
match 0 -> 0.11111     -> 0.1111    expected 0.111
match 0 -> 0.01111     -> 0.01111   expected 0.011
match 0 -> 0.00111     -> 0.00111   expected 0.001
match 0 -> 0.00011     -> 0.00011   expected 11e-4
match 0 -> 1e-05       -> 1e-05     expected 1e-5
like image 467
Drew Noakes Avatar asked Oct 31 '22 14:10

Drew Noakes


1 Answers

For packing numbers within a certain range into the smallest unsigned integer:

1) Subtract the smallest possible value. For example, if your numbers may range from 0.001 to 100000 and a specific number is 123.456, then subtract 0.001 to get 123.455

2) Divide by the precision you care about. For example, if you care about thousandths then divide by 0.001. In this case the number 123.455 becomes 123455

Once you've done this and have the smallest width unsigned integer, convert it to hexadecimal digits (or maybe "base 32 digits"). For the example above, 0.001 would become 0x00000000, 123.456 would become 0x0001E23F and 100000 would become 0x05F5E0FF.

If you want "variable precision", you can add a third step that splits the unsigned integer value into "value and shift count" form. For example:

    shift_count = 0;
    while(value > 0xFFF) {
        value = value >> 1;
        shift_count++;
    }

Then you can concatenate with something like value = (value << 4) | shift_count.

In that way, you could compress your numbers down to 4 hexadecimal digits. For the examples above, 0.001 would become 0x0000 (exactly representing 0.001), 123.456 would become 0xF115 (actually representing 123.425) and 100000 would become 0xBEBF (actually representing 99975.169).

like image 129
Brendan Avatar answered Nov 15 '22 05:11

Brendan