Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Heuristic to identify if a series of 4 bytes chunks of data are integers or floats

What's the best heuristic I can use to identify whether a chunk of X 4-bytes are integers or floats? A human can do this easily, but I wanted to do it programmatically.

I realize that since every combination of bits will result in a valid integer and (almost?) all of them will also result in a valid float, there is no way to know for sure. But I still would like to identify the most likely candidate (which will virtually always be correct; or at least, a human can do it).

For example, let's take a series of 4-bytes raw data and print them as integers first and then as floats:

1           1.4013e-45
10          1.4013e-44
44          6.16571e-44
5000        7.00649e-42
1024        1.43493e-42
0           0
0           0
-5          -nan
11          1.54143e-44

Obviously they will be integers.

Now, another example:

1065353216  1
1084227584  5
1085276160  5.5
1068149391  1.33333
1083179008  4.5
1120403456  100
0           0
-1110651699 -0.1
1195593728  50000

These will obviously be floats.

PS: I'm using C++ but you can answer in any language, pseudo code or just in english.

like image 589
flint Avatar asked Mar 21 '10 00:03

flint


1 Answers

The "common sense" heuristic from your example seems to basically amount to a range check. If one interpretation is very large (or a tiny fraction, close to zero), that is probably wrong. Check the exponent of the float interpretation and compare it to the exponent that results from a proper static cast of the integer interpretation to a float.

like image 160
Alan Avatar answered Sep 18 '22 15:09

Alan