I was experimenting with unsafeCoerce
with Int8
and Word8
, and I found some surprising behaviour (for me anyway).
Word8
is a 8 bit unsigned number that ranges from 0-255. Int8
is a signed 8 bit number that ranges from -128..127.
Since they are both 8 bit numbers, I assumed that coercing one to another would be safe, and just return the 8 bit values as if it was signed/unsigned.
For example, unsafeCoerce (-1 :: Int8) :: Word8
I would expect to result in a Word8
value of 255 (since the bit representation of -1 in a signed int is the same as 255 in an unsigned int).
However, when I do perform the coerce, the Word8
the behaviour is strange:
> GHCi, version 7.4.1: http://www.haskell.org/ghc/ :? for help
> import Data.Int
> import Data.Word
> import Unsafe.Coerce
> class ShowType a where typeName :: a -> String
> instance ShowType Int8 where typeName _ = "Int8"
> instance ShowType Word8 where typeName _ = "Word8"
> let x = unsafeCoerce (-1 :: Int8) :: Word8
> show x
"-1"
> typeName x
"Word8"
> show (x + 0)
"255"
> :t x
x :: Word8
> :t (x + 0)
(x + 0) :: Word8
I don't understand how show x
is returning "-1"
here. If you look at map show [minBound..maxBound :: Word8]
, no possible value for Word8
results in "-1"
. Also, how does adding 0 to the number change the behaviour, even if the type isn't changed? Strangely, it also appears it is only the Show
class that is affected - my ShowType
class returns the correct value.
Finally, the code fromIntegral (-1 :: Int8) :: Word8
works as expected, and returns 255, and works correctly with show
. Is/can this code be reduced to a no-op by the compiler?
Note that this question is just out of curiosity about how types are represented in ghc at a low level. I'm not actually using unsafeCoerce in my code.
Like @kosmikus said, both Int8
and Int16
are implemented using an Int#
, which is 32 bit-wide on 32-bit architectures (and Word8
and Word16
are Word#
under the hood). This comment in GHC.Prim explains this in more detail.
So let's find out why this implementation choice results in the behaviour you see:
> let x = unsafeCoerce (-1 :: Int8) :: Word8
> show x
"-1"
The Show
instance for Word8
is defined as
instance Show Word8 where
showsPrec p x = showsPrec p (fromIntegral x :: Int)
and fromIntegral
is just fromInteger . toInteger
. The definition of toInteger
for Word8
is
toInteger (W8# x#) = smallInteger (word2Int# x#)
where smallInteger
(defined in integer-gmp) is
smallInteger :: Int# -> Integer
smallInteger i = S# i
and word2Int#
is a primop with type Word# -> Int#
- an analog of reinterpret_cast<int>
in C++. So that explains why you see -1
in the first example: the value is just reinterpreted as a signed integer and printed out.
Now, why would adding 0
to x
give you 255
? Looking at the Num
instance for Word8
we see this:
(W8# x#) + (W8# y#) = W8# (narrow8Word# (x# `plusWord#` y#))
So it looks like the narrow8Word#
primop is the culprit. Let's check:
> import GHC.Word
> import GHC.Prim
> case x of (W8# w) -> (W8# (narrow8Word# w))
255
Indeed it is. That explains why adding 0 is not a no-op - Word8
addition actually clamps down the value to the intended range.
You can't say something is wrong when you've used unsafeCoerce
. Anything can happen if you use that function. The compiler probably stores an Int8
in a word, and using unsafeCoerce
to Word8
breaks the invariants on what is stored in this word. Use fromIntegral
to convert.
Conversion from Int8
to Word8
using fromIntegral
turns into a movzbl
instruction using ghc on x86, which is basically a no-op.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With