I just started learning Haskell. Below is some code written in an imperative style that implements a simple server -- it prints out the HTTP request headers. Besides the fact that I need to rethink it in Haskell, to work with lazy lists and higher order functions, I'd like to see clearly why it does not do what I intended. It is always one behind -- I hit it with a request, nothing happens, hit it again, it prints the first request, hit it the 3rd time, it prints the 2nd request, etc. Why is that? And what is a minimal change to this code that would cause it to print right when the request came in?
import Network
import System.IO
import Network.HTTP.Headers
acceptLoop :: Socket -> IO ()
acceptLoop s = do
(handle, hostname, _) <- accept s
putStrLn ("Accepted connection from " ++ hostname)
text <- hGetContents handle
let lns = lines text
hds = tail lns
print $ parseHeaders hds
hClose handle
acceptLoop s
main :: IO ()
main = do
s <- listenOn (PortNumber 8080)
acceptLoop s
thanks, Rob
All the answers were helpful. The code below works, but does not use bytestrings, as suggested, yet. A followup question: can ioTakeWhile
be replaced by using some functions from the standard libraries, maybe in Control.Monad?
ioTakeWhile :: (a -> Bool) -> [IO a] -> IO [a]
ioTakeWhile pred actions = do
x <- head actions
if pred x
then (ioTakeWhile pred (tail actions)) >>= \xs -> return (x:xs)
else return []
acceptLoop :: Socket -> IO ()
acceptLoop s = do
(handle, hostname, _) <- accept s
putStrLn ("Accepted connection from " ++ hostname)
let lineActions = repeat (hGetLine handle)
lines <- ioTakeWhile (/= "\r") lineActions
print lines
hClose handle
Lazy evaluation is a method to evaluate a Haskell program. It means that expressions are not evaluated when they are bound to variables, but their evaluation is deferred until their results are needed by other computations.
Haskell is a lazy language. It does not evaluate expressions until it absolutely must. This frequently allows our programs to save time by avoiding unnecessary computation, but they are at more of a risk to leak memory. There are ways of introducing strictness into our programs when we don't want lazy evaluation.
In spite of Haskell being purely functional, IO actions can be said to be impure because their impacts on the outside world are side effects (as opposed to the regular effects that are entirely contained within Haskell).
Haskell is a pure language Moreover, Haskell functions can't have side effects, which means that they can't effect any changes to the "real world", like changing files, writing to the screen, printing, sending data over the network, and so on.
Your problem is using hGetContents
will get all contents on the handle till the socket closes. You follow this call by trying to parse the last line of the input, which won't be known till the connection terminates.
The solution: get as much data as you need (or is available) then terminate the connection.
It's late and I'm tired but here's a solution I know is non-optimal (read: ugly as sin): You can move to bytestrings (should do this anyway) and use hGetNonBlocking or hGetSome
instead of hGetContents
. Alternatively, you can hGetLine
(blocking) continually till the parse succeeds to your satisfaction:
import Network
import System.IO
import Network.HTTP.Headers
import Control.Monad
import qualified Data.ByteString.Char8 as B
import Data.ByteString (hGetSome)
acceptLoop :: Socket -> IO ()
acceptLoop s = do
(handle, hostname, _) <- accept s
putStrLn ("Accepted connection from " ++ hostname)
printHeaders handle B.empty
hClose handle
where
printHeaders h s = do
t <- hGetSome h 4096
let str = B.append s t -- inefficient!
loop = printHeaders h str
case (parseHeaders . tail . lines) (B.unpack str) of
Left _ -> loop
Right x
| length x < 3 -> loop
| otherwise -> print x
main :: IO ()
main = do
hSetBuffering stdin NoBuffering
s <- listenOn (PortNumber 8080)
forever $ acceptLoop s
The "flow of control" in lazy programs is different than you're used to. Things won't be evaluated until they have to which is why your program is always a request behind with the output.
In general, you can make something strict by using the "bang" operator !
and the BangPatterns
pragma.
If you use it in this case (by saying !text <- hGetContents handle
) you will get the output of the headers once the request is finished. Unfortunately, hGetContents
doesn't know when to stop waiting for more data before the print
statement, because handle
is not closed.
If you additionally restructure the program to have the hClose handle
before both the let
statement, and print
, then the program behaves like you want.
In the other case, the print
is not evaluated because the value of text
is never "finalized" by the closing of handle
. Since it's "lazy", print
is then waiting on hds
and lns
, which are in turn waiting on text
, which is waiting on hClose
... which is why you were getting the weird behaviour; hClose
was not being evaluated until the socket was needed by the next request, which is why there was no output until then.
Note that simply making text
strict will still block the program forever, leaving it "waiting" for the file to close. Yet, if the file is closed when text
is non-strict, it will always be empty, and cause an error. Using both together will get the desired effect.
Three changes were made: I added the {-# LANGUAGE BangPatterns #-}
pragma, a single character (!
) in front of text
, and moved hClose handle
up a few lines.
{-# LANGUAGE BangPatterns #-}
import Network
import System.IO
import Network.HTTP.Headers
acceptLoop :: Socket -> IO ()
acceptLoop s = do
(handle, hostname, _) <- accept s
putStrLn ("Accepted connection from " ++ hostname)
!text <- hGetContents handle
hClose handle
let lns = lines text
hds = tail lns
print $ parseHeaders hds
acceptLoop s
main :: IO ()
main = do
s <- listenOn (PortNumber 8080)
acceptLoop s
To sidestep issues like this altogether, you can try using the hGetContents
function from the System.IO.Strict
module instead of System.IO
.
Rather than explicit recursion in acceptLoop
, I find the following main
to be more idiomatic:
main = do
s <- listenOn (PortNumber 8080)
sequence_ $ repeat $ acceptLoop s
Doing this, you can remove the recursive call from acceptLoop
.
TomMD's solution uses forever
from the Contol.Monad
module, which is good too.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With