Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Haskell GHCi - Using EOF character on stdin with getContents

I like to parse strings ad hoc in Python by just pasting into the interpreter.

>>> s = """Adams, John
... Washington,George
... Lincoln,Abraham
... Jefferson, Thomas
... """
>>> print "\n".join(x.split(",")[1].replace(" ", "")
                    for x in s.strip().split("\n"))
John
George
Abraham
Thomas

This works great using the Python interpreter, but I'd like to do this with Haskell/GHCi. Problem is, I can't paste multi-line strings. I can use getContents with an EOF character, but I can only do it once since the EOF character closes stdin.

Prelude> s <- getContents
Prelude> s
"Adams, John
Adams, John\nWashington,George
Washington,George\nLincoln,Abraham
Lincoln,Abraham\nJefferson, Thomas
Jefferson, Thomas\n^Z
"
Prelude> :{
Prelude| putStr $ unlines $ map ((filter (`notElem` ", "))
Prelude|                         . snd . (break (==','))) $ lines s
Prelude| :}
John
George
Abraham
Thomas
Prelude> x <- getContents
*** Exception: <stdin>: hGetContents: illegal operation (handle is closed)

Is there a better way to go about doing this with GHCi? Note - my understanding of getContents (and Haskell IO in general) is probably severely broken.

UPDATED

I will be playing with the answers I have received. Here are some helper functions I made (plagiarized) that simulate Python's """ quoting (by ending with """, not starting) from ephemient's answer.

getLinesWhile :: (String -> Bool) -> IO String
getLinesWhile p = liftM unlines $ takeWhileM p (repeat getLine)

getLines :: IO String
getLines = getLinesWhile (/="\"\"\"")

To use AndrewC's answer in GHCi -

C:\...\code\haskell> ghci HereDoc.hs -XQuasiQuotes
ghci> :{
*HereDoc| let s = [heredoc|
*HereDoc| Adams, John
*HereDoc| Washington,George
*HereDoc| Lincoln,Abraham
*HereDoc| Jefferson, Thomas
*HereDoc| |]
*HereDoc| :}
ghci> putStrLn s
Adams, John
Washington,George
Lincoln,Abraham
Jefferson, Thomas
ghci> :{
*HereDoc| putStr $ unlines $ map ((filter (`notElem` ", "))
*HereDoc|                         . snd . (break (==','))) $ lines s
*HereDoc| :}
John
George
Abraham
Thomas
like image 826
pyrospade Avatar asked Aug 25 '12 05:08

pyrospade


2 Answers

getContents == hGetContents stdin. Unfortunately, hGetContents marks its handle as (semi-)closed, which means anything attempting to read from stdin ever again will fail.

Does it suffice to simply read up to an empty line or some other marker, never closing stdin?

takeWhileM :: Monad m => (a -> Bool) -> [m a] -> m [a]
takeWhileM p (ma : mas) = do
    a <- ma
    if p a
      then liftM (a :) $ takeWhileM p mas
      else return []
takeWhileM _ _ = return []
ghci> liftM unlines $ takeWhileM (not . null) (repeat getLine)
Adams, John
Washington, George
Lincoln, Abraham
Jefferson, Thomas

"Adams, John\nWashington, George\nLincoln, Abraham\nJefferson, Thomas\n"
ghci>
like image 102
ephemient Avatar answered Sep 21 '22 12:09

ephemient


If you do this a lot, and you're writing helper functions in some module anyway, why not go the whole hog and use your editor for the raw data too:

{-# LANGUAGE TemplateHaskell, QuasiQuotes #-}
module ParseAdHoc where
import HereDoc
import Data.Char (isSpace)
import Data.List (intercalate,intersperse)  -- other handy helpers

-- ------------------------------------------------------
-- edit this bit every time you do your ad-hoc parsing

adhoc :: String -> String
adhoc = head . splitOn ',' . rmspace

input = [heredoc|
Adams, John
Washington,George
Lincoln,Abraham
Jefferson, Thomas
|]

-- ------------------------------------------------------
-- add other helpers you'll reuse here

main = mapM_ putStrLn.map adhoc.lines $ input

rmspace = filter (not.isSpace)

splitWith :: (a -> Bool) -> [a] -> [[a]]   -- splits using a function that tells you when
splitWith isSplitter list =  case dropWhile isSplitter list of
  [] -> []
  thisbit -> firstchunk : splitWith isSplitter therest
    where (firstchunk, therest) = break isSplitter thisbit

splitOn :: Eq a => a -> [a] -> [[a]]       -- splits on the given item
splitOn c = splitWith (== c)

splitsOn :: Eq a => [a] -> [a] -> [[a]]    -- splits on any of the given items
splitsOn chars = splitWith (`elem` chars)

It would be easier to use takeWhile (/=',') instead of head . splitOn ',', but I thought that splitOn will be more useful to you in the future.

This uses a helper module, HereDoc, that lets you paste multiline string literals into your code (like perl's <<"EOF" or python's """). I can't remember how I found how to do this, but I've tweaked it to remove whitespace first and last lines, so I can start and end my data with a newline.

module HereDoc where
import Language.Haskell.TH
import Language.Haskell.TH.Quote
import Data.Char (isSpace)

{-
example1 = [heredoc|Hi.
This is a multi-line string.
It should appear as an ordinary string literal.

Remember you can only use a QuasiQuoter
in a different module, so import this HereDoc module 
into something else and don't forget the
{-# LANGUAGE TemplateHaskell, QuasiQuotes #-}|]

example2 = [heredoc|         
This heredoc has no newline characters in it because empty or whitespace-only first and last lines are ignored
                   |]
-}


heredoc = QuasiQuoter {quoteExp = stringE.topAndTail,
                       quotePat = litP . stringL,
                       quoteType = undefined,
                       quoteDec = undefined}

topAndTail = myunlines.tidyend.tidyfront.lines

tidyfront :: [String] -> [String]
tidyfront [] = []
tidyfront (xs:xss) | all isSpace xs = xss
                   | otherwise      = xs:xss

tidyend :: [String] -> [String]
tidyend [] = []
tidyend [xs]     | all isSpace xs = []
                 | otherwise = [xs]
tidyend (xs:xss) = xs:tidyend xss

myunlines :: [String] -> String
myunlines [] = ""
myunlines (l:ls) = l ++ concatMap ('\n':) ls

You might find Data.Text a good source of (inspiration for) helper functions: http://hackage.haskell.org/packages/archive/text/latest/doc/html/Data-Text.html

like image 26
AndrewC Avatar answered Sep 18 '22 12:09

AndrewC