Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Haskell Curl Help

Ok, I'm trying to wrap my head around IO in Haskell, and I figured I'd write a short little app dealing with web pages to do it. The snippet I'm getting tripped up at is (with apologies to bobince, though to be fair, I'm not trying to parse HTML here, just extract one or two values):

titleFromUrl url = do
    (_, page) <- curlGetString url [CurlTimeout 60]   
    matchRegex (mkRegexWithOpts "<title>(.*?)</title>" False True) page

The above should take a URL in string form, scan the page it points to with matchRegex, and return either Nothing or Just [a], where a is the matched (possibly multi-line) string. The frustrating thing is that when I try doing

Prelude> (_, page) <- curlGetString url [CurlTimeout 60]
Prelude> matchRegex (mkRegexWithOpts "<title>(.*?)</title>" False True) page

in the interpreter, it does precisely what I want it to. When I try to load the same expression, and associated imports from a file, it gives me a type inference error stating that it couldn't match expected type 'IO b' against inferred type 'Maybe [String]'. This tells me I'm missing something small and fundamental, but I can't figure out what. I've tried explicitly casting page to a string, but that's just programming by superstition (and it didn't work in any case).

Any hints?

like image 286
Inaimathi Avatar asked Nov 16 '10 04:11

Inaimathi


1 Answers

Yeah, GHCi accepts any sort of value. You can say:

ghci> 4
4
ghci> print 4
4

But those two values (4 and print 4) are clearly not equal. The magic GHC is doing is that if what you typed evaluates to an IO something then it executes that action (and prints the result if something is not ()). If it doesn't, then it calls show on the value and prints that. Anyway, this magic is not accessible from your program.

When you say:

do foo <- bar :: IO Int
   baz

baz is expected to be of type IO something, and it's a type error otherwise. That would let you execute I/O and then return a pure value. You can check that with noting that desugaring the above yields:

bar >>= (\foo -> baz)

And

-- (specializing to IO for simplicity)
(>>=) :: IO a -> (a -> IO b) -> IO b

Therefore

bar :: IO a
foo :: a
baz :: IO b

The way to fix it is to turn your return value into an IO value using the return function:

return :: a -> IO a  -- (again specialized to IO)

Your code is then:

titleFromUrl url = do
    (_, page) <- curlGetString url [CurlTimeout 60]   
    return $ matchRegex (mkRegexWithOpts "<title>(.*?)</title>" False True) page

For most of the discussion above, you can substitute any monad for IO (eg. Maybe, [], ...) and it will still be true.

like image 187
luqui Avatar answered Sep 20 '22 00:09

luqui