Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I make a do block return early?

I'm trying to scrape for a webpage using Haskell and compile the results into an object.

If, for whatever reason, I can't get all the items from the pages, I want to stop trying to process the page and return early.

For example:

scrapePage :: String -> IO ()
scrapePage url = do
  doc <- fromUrl url
  title <- liftM headMay $ runX $ doc >>> css "head.title" >>> getText
  when (isNothing title) (return ())
  date <- liftM headMay $ runX $ doc >>> css "span.dateTime" ! "data-utc"
  when (isNothing date) (return ())
  -- etc
  -- make page object and send it to db
  return ()

The problem is the when doesn't stop the do block or keep the other parts from being executed.

What is the right way to do this?

like image 247
Joe Hillenbrand Avatar asked Mar 15 '13 20:03

Joe Hillenbrand


People also ask

How do I return early Python?

The idea behind returning early is that you write functions that return the expected positive result at the end of the function. The rest of the code, in the function, should trigger the termination as soon as possible in case of divergence with the function's purpose.

Do blocks return?

In fact, a do block always returns its last return value. (That's why you cannot have a <- expression as the last line of a do block. You know this from the DAML Studio as well, where you have “The last statement in a 'do' block must be an expression” error message if you violate this rule.)


3 Answers

return in haskell does not do the same thing as return in other languages. Instead, what return does is to inject a value into a monad (in this case IO). You have a couple of options

the most simple is to use if

scrapePage :: String -> IO ()
scrapePage url = do
  doc <- fromUrl url
  title <- liftM headMay $ runX $ doc >>> css "head.title" >>> getText
  if (isNothing title) then return () else do
   date <- liftM headMay $ runX $ doc >>> css "span.dateTime" ! "data-utc"
   if (isNothing date) then return () else do
     -- etc
     -- make page object and send it to db
     return ()

another option is to use unless

scrapePage url = do
  doc <- fromUrl url
  title <- liftM headMay $ runX $ doc >>> css "head.title" >>> getText
  unless (isNothing title) do
    date <- liftM headMay $ runX $ doc >>> css "span.dateTime" ! "data-utc"
    unless (isNothing date) do
      -- etc
      -- make page object and send it to db
      return ()

the general problem here is that the IO monad doesn't have control effects (except for exceptions). On the other hand, you could use the maybe monad transformer

scrapePage url = liftM (maybe () id) . runMaybeT $ do
  doc <- liftIO $ fromUrl url
  title <- liftIO $ liftM headMay $ runX $ doc >>> css "head.title" >>> getText
  guard (isJust title)
  date <- liftIO $ liftM headMay $ runX $ doc >>> css "span.dateTime" ! "data-utc"
  guard (isJust date)
  -- etc
  -- make page object and send it to db
  return ()

if you really want to get full blown control effects you need to use ContT

scrapePage :: String -> IO ()
scrapePage url = runContT return $ do
  doc <- fromUrl url
  title <- liftM headMay $ runX $ doc >>> css "head.title" >>> getText
  when (isNothing title) $ callCC ($ ())
  date <- liftM headMay $ runX $ doc >>> css "span.dateTime" ! "data-utc"
  when (isNothing date) $ callCC ($ ())
  -- etc
  -- make page object and send it to db
  return ()

WARNING: none of the above code has been tested, or even type checked!

like image 99
Philip JF Avatar answered Oct 16 '22 00:10

Philip JF


Use a monad transformer!

import Control.Monad.Trans.Class -- from transformers package
import Control.Error.Util        -- from errors package

scrapePage :: String -> IO ()
scrapePage url = maybeT (return ()) return $ do
  doc <- lift $ fromUrl url
  title <- liftM headMay $ lift . runX $ doc >>> css "head.title" >>> getText
  guard . not $ isNothing title
  date <- liftM headMay $ lift . runX $ doc >>> css "span.dateTime" ! "data-utc"
  guard . not $ isNothing date
  -- etc
  -- make page object and send it to db
  return ()

For more flexibility in the return value when you early return, use throwError/eitherT/EitherT instead of mzero/maybeT/MaybeT. (Although then you can't use guard.)

(Probably also use headZ instead of headMay and ditch the explicit guard.)

like image 26
dave4420 Avatar answered Oct 16 '22 01:10

dave4420


I have never worked with Haskell, but it seems quitte easy. Try when (isNothing date) $ exit (). If this also isn't working, then make sure your statement is correct. Also see this website for more info: Breaking From loop.

like image 1
Nick van Tilborg Avatar answered Oct 16 '22 00:10

Nick van Tilborg