Why do the hash values differ for NaN and Inf - Inf?

Tags:

I use this hash function a lot, i.e. to record the value of a dataframe. Wanted to see if I could break it. Why aren't these hash values identical?

This requires the digest package.

Plain text output:

Click to copy

> digest(Inf-Inf) [1] "0d59b2dae9351c1ce6c76133295322d7" > digest(NaN) [1] "4e9653ddf814f0d16b72624aeb85bc20" > digest(1) [1] "6717f2823d3202449301145073ab8719" > digest(1 + 0) [1] "6717f2823d3202449301145073ab8719" > digest(5) [1] "5e338704a8e069ebd8b38ca71991cf94" > digest(sum(1, 1, 1, 1, 1)) [1] "5e338704a8e069ebd8b38ca71991cf94" > digest(1^0) [1] "6717f2823d3202449301145073ab8719" > 1^0 [1] 1 > digest(1) [1] "6717f2823d3202449301145073ab8719"

Additional weirdness. Calculations that equal NaN have identical hash values, but NaN's hash values are not equivalent:

Click to copy

> Inf - Inf [1] NaN > 0/0 [1] NaN > digest(Inf - Inf) [1] "0d59b2dae9351c1ce6c76133295322d7" > digest(0/0) [1] "0d59b2dae9351c1ce6c76133295322d7" > digest(NaN) [1] "4e9653ddf814f0d16b72624aeb85bc20"

952

asked Jan 08 '19 16:01

King_Cordelia

1 Answers

tl;dr this has to do with very deep details of how NaNs are represented in binary. You could work around it by using digest(.,ascii=TRUE) ...

Following up on @Jozef's answer: note boldfaced digits ...

Click to copy

 > base::serialize(Inf-Inf,connection=NULL) [1] 58 0a 00 00 00 03 00 03 06 00 00 03 05 00 00 00 00 05 55 54 46 2d 38 00 00 [26] 00 0e 00 00 00 01 ff f8 00 00 00 00 00 00 > base::serialize(NaN,connection=NULL) [1] 58 0a 00 00 00 03 00 03 06 00 00 03 05 00 00 00 00 05 55 54 46 2d 38 00 00 [26] 00 0e 00 00 00 01 7f f8 00 00 00 00 00 00

Alternatively, using pryr::bytes() ...

Click to copy

> bytes(NaN) [1] "7F F8 00 00 00 00 00 00" > bytes(Inf-Inf) [1] "FF F8 00 00 00 00 00 00"

The Wikipedia article on floating point format/NaNs says:

Some operations of floating-point arithmetic are invalid, such as taking the square root of a negative number. The act of reaching an invalid result is called a floating-point exception. An exceptional result is represented by a special code called a NaN, for "Not a Number". All NaNs in IEEE 754-1985 have this format:

sign = either 0 or 1.

biased exponent = all 1 bits.

fraction = anything except all 0 bits (since all 0 bits represents infinity).

The sign is the first bit; the exponent is the next 11 bits; the fraction is the last 52 bits. Translating the first four hex digits given above to binary, Inf-Inf is 1111 1111 1111 0100 (sign=1; exponent is all ones, as required; fraction starts with 0100) whereas NaN is 0111 1111 1111 0100 (the same, but with sign=0).

To understand why Inf-Inf ends up with sign bit 1 and NaN has sign bit 0 you'd probably have to dig more deeply into the way floating point arithmetic is implemented on this platform ...

It might be worth raising an issue on the digest GitHub repo about this; I can't think of an elegant way to do it, but it seems reasonable that objects where identical(x,y) is TRUE in R should have identical hashes ... Note that identical() specifically ignores these differences in bit patterns via the single.NA (default TRUE) argument:

single.NA: logical indicating if there is conceptually just one numeric ‘NA’ and one ‘NaN’; ‘single.NA = FALSE’ differentiates bit patterns.

Within the C code, it looks like R simply uses C's != operator to compare NaN values unless bitwise comparison is enabled, in which case it does an explicit check of equality of the memory locations: see here. That is, C's comparison operator appears to treat different kinds of NaN values as equivalent ...

104

answered Sep 21 '22 12:09

Ben Bolker

Related questions
                            
                                TypeScript Partial<T> type without undefined
                            
                                SKLearn 0.20.2 - Import error with RandomizedPCA?
                            
                                useEffect Hook Not Firing After State Change
                            
                                How to store all ctor parameters in fields
                            
                                Error on uploading app to play console- "Upload failed The Android App Bundle was not signed ."
                            
                                What is the point of using SubSink instead of a Subscriptions array
                            
                                Getting "ErrImageNeverPull" in pods
                            
                                Change ViewPager2 Scroll Speed when sliding programmatically
                            
                                vuetify v-col xs="12" only fill half the width
                            
                                Recreate iOS 13' share sheet modal in swift (not the share sheet itself, but the way it's presented)
                            
                                Initialising a variable of unknown type via overloaded constructors in C++
                            
                                How to find the Big-O complexity mentioned below

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why do the hash values differ for NaN and Inf - Inf?

Tags:

King_Cordelia

People also ask

1 Answers

Ben Bolker

Recent Activity

Donate For Us