Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Evaluation of 'and' clause with regards to laziness

I don't understand why the following code behaves the way it does:

myand :: Bool -> Bool -> Bool
myand True True = True
myand _ _ = False

containsAandB :: String -> IO Bool
containsAandB s = do
  containsA <- do
    putStrLn $ "Check if 'a' is in " ++ s
    return $ 'a' `elem` s
  containsB <- do
    putStrLn $ "Check if 'b' is in " ++ s
    return $ 'b' `elem` s
  return $ containsA `myand` containsB

This is the output when I test the function:

*Main> containsAandB "def"
Check if 'a' is in def
Check if 'b' in in def
False

Note that (&&) behaves just like 'myand', I just wrote a custom function to better visualize what's happeninig. I'm surprised about the 'Check if 'b' is in def part since containsA is already false so 'myand' can be evaluated regardless of containsB.

Question 1:
Is there a particular reason why containsB has to be evaluated too? My understanding was that the containsB <- do ... statement is not evaluated unless it's required but my guess is that this might behave differently since it's IO and therefore not free of side effects?

Question2:
What's the best practice approach to get the desired behavior (if containsA is false, containsB is not checked) without dealing with nested if-else clauses?

like image 430
ryan91 Avatar asked Dec 23 '22 06:12

ryan91


2 Answers

Question 1: Is there a particular reason why containsB has to be evaluated too? My understanding was that the containsB <- do ... statement is not evaluated unless it's required but my guess is that this might behave differently since it's IO and therefore not free of side effects?

Your experiment is flawed because you perform IO. One of the important aspects of IO is that the order of IO statements is respected. So even if due to lazyness, we do not need a certain value, the IO part is executed.

This is logical in the world of IO: imagine that we read a file, and we have three parts. We read the first two parts, and then we read the third one. But now imagine that due to laziness, the second IO command is never executed. Then that would mean that third part actually reads the second part of the file.

So in short due to IO, the statements are evaluated. But only the IO statements. So the value wrapped inside the return is not evaluated, unless you need it. The check 'b' `elem` s only happens when we need it.

There are however ways to "trick" the IO out of this. For example trace (from the Debug.Trace) module will perform "unsafe IO": it will print the error message given it is evaluated. If we write:

Prelude> import Debug.Trace
Prelude Debug.Trace> myand (trace "a" False) (trace "b" False)

we got:

Prelude Debug.Trace> myand (trace "a" False) (trace "b" False)
a
False

Question2: What's the best practice approach to get the desired behavior (if containsA is false, containsB is not checked) without dealing with nested if-else clauses?

Well as said before, normal behavior is that containsB is not evaluated. But if you perform IO actions, those have to be performed before you actually do the checking. This is bascially one of the aspects that the (>>=) operator for IO (you use this operator implcitly in a do block) handles.

like image 94
Willem Van Onsem Avatar answered Jan 06 '23 08:01

Willem Van Onsem


do blocks get translated into calls to >>= and >>. In particular, your code becomes (unless I missed some parentheses)

containsAandB s = 
  (putStrLn $ "Check if 'a' is in " ++ s >>
   return $ 'a' `elem` s) >>= (\containsA ->
  (putStrLn $ "Check if 'b' is in " ++ s >>
   return $ 'b' `elem` s) >>= (\containsB ->
  return $ containsA `myand` containsB))

So containsB <- do ... isn't really a statement; it makes the do ... part the first argument to a >>= call. And >>= (and >>) for IO is defined so it always runs its first argument. So to get to the last return $ ..., both putStrLn calls already must have run.

This behavior isn't limited to the IO monad; e.g. see Difference between Haskell's Lazy and Strict monads (or transformers).

What's the best practice approach to get the desired behavior (if containsA is false, containsB is not checked) without dealing with nested if-else clauses?

You can deal with them once and for all:

andM :: (Monad m) => m Boolean -> m Boolean -> m Boolean
andM m1 m2 = do
  x <- m1
  case x of
    True -> m2
    False -> return False

containsAandB s = andM
  (do
    putStrLn $ "Check if 'a' is in " ++ s
    return $ 'a' `elem` s)
  (do
    putStrLn $ "Check if 'b' is in " ++ s
    return $ 'b' `elem` s)

or

containsAandB :: String -> IO Bool
containsAandB s = do
  let containsA = do
    putStrLn $ "Check if 'a' is in " ++ s
    return $ 'a' `elem` s
  let containsB = do
    putStrLn $ "Check if 'b' is in " ++ s
    return $ 'b' `elem` s
  containsA `andM` containsB

(andM is in http://hackage.haskell.org/package/extra-1.6.8/docs/Control-Monad-Extra.html as (&&^), along with other similar functions).

like image 39
Alexey Romanov Avatar answered Jan 06 '23 07:01

Alexey Romanov