Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert String to Int checking for overflow

When I tried to convert a very long integer to Int, I was surprised that no error was thrown:

Prelude> read "123456789012345678901234567890" :: Int
-4362896299872285998

readMaybe from Text.Read module gives the same result.

Two questions:

  • Which function should I call to perform a safe conversion?
  • How can the most type safe language on Earth allow such unsafe things?

Update 1:

This is my attempt to write a version of read that checks bounds:

{-# LANGUAGE ScopedTypeVariables #-}

parseIntegral :: forall a . (Integral a, Bounded a) => String -> Maybe a
parseIntegral s = integerToIntegral (read s :: Integer) where
  integerToIntegral n | n < fromIntegral (minBound :: a) = Nothing
  integerToIntegral n | n > fromIntegral (maxBound :: a) = Nothing
  integerToIntegral n = Just $ fromInteger n

Is it the best I can do?

like image 855
ZhekaKozlov Avatar asked Nov 01 '25 20:11

ZhekaKozlov


1 Answers

Background: why unchecked overflow is actually wonderful

Haskell 98 leaves overflow behavior explicitly unspecified, which is good for implementers and bad for everyone else. Haskell 2010 discusses it in two sections—in the section inherited from Haskell 98, it's left explicitly unspecified, whereas in the sections on Data.Int and Data.Word, it is specified. This inconsistency will hopefully be resolved eventually.

GHC is kind enough to specify it explicitly:

All arithmetic is performed modulo 2^n, where n is the number of bits in the type.

This is an extremely useful specification. In particular, it guarantees that Int, Word, Int64, Word32, etc., form rings, and even principal ideal rings, under addition and multiplication. This means that arithmetic will always work right—you can transform expressions and equations in lots of different ways without breaking things. Throwing exceptions on overflow would break all these properties, making it much more difficult to write and reason about programs. The only times you really need to be careful are when you use comparison operators like < and compare—fixed width integers do not form ordered groups, so these operators are a bit touchy.

Why it makes sense not to check reads

Reading an integer involves many multiplications and additions. It also needs to be fast. Checking to make sure the read is "valid" is not so easy to do quickly. In particular, while it's easy to find out whether an addition has overflowed, it is not easy to find out whether a multiplication has. The only sensible ways I can think of to perform a checked read for Int are

  1. Read as an Integer, check, then convert. Integer arithmetic is significantly more expensive than Int arithmetic. For smaller things, like Int16, the read can be done with Int, checking for Int16 overflow, then narrowed. This is cheaper, but still not free.

  2. Compare the number in decimal to maxBound (or, for a negative number, minBound) while reading. This seems more likely to be reasonably efficient, but there will still be some cost. As the first section of this answer explains, there is nothing inherently wrong with overflow, so it's not clear that throwing an error is actually better than giving an answer modulo 2^n.

like image 109
dfeuer Avatar answered Nov 03 '25 21:11

dfeuer



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!