I like to parse strings ad hoc in Python by just pasting into the interpreter.
>>> s = """Adams, John
... Washington,George
... Lincoln,Abraham
... Jefferson, Thomas
... """
>>> print "\n".join(x.split(",")[1].replace(" ", "")
for x in s.strip().split("\n"))
John
George
Abraham
Thomas
This works great using the Python interpreter, but I'd like to do this with Haskell/GHCi. Problem is, I can't paste multi-line strings. I can use getContents with an EOF character, but I can only do it once since the EOF character closes stdin.
Prelude> s <- getContents
Prelude> s
"Adams, John
Adams, John\nWashington,George
Washington,George\nLincoln,Abraham
Lincoln,Abraham\nJefferson, Thomas
Jefferson, Thomas\n^Z
"
Prelude> :{
Prelude| putStr $ unlines $ map ((filter (`notElem` ", "))
Prelude| . snd . (break (==','))) $ lines s
Prelude| :}
John
George
Abraham
Thomas
Prelude> x <- getContents
*** Exception: <stdin>: hGetContents: illegal operation (handle is closed)
Is there a better way to go about doing this with GHCi? Note - my understanding of getContents (and Haskell IO in general) is probably severely broken.
UPDATED
I will be playing with the answers I have received. Here are some helper functions I made (plagiarized) that simulate Python's """
quoting (by ending with """
, not starting) from ephemient's answer.
getLinesWhile :: (String -> Bool) -> IO String
getLinesWhile p = liftM unlines $ takeWhileM p (repeat getLine)
getLines :: IO String
getLines = getLinesWhile (/="\"\"\"")
To use AndrewC's answer in GHCi -
C:\...\code\haskell> ghci HereDoc.hs -XQuasiQuotes
ghci> :{
*HereDoc| let s = [heredoc|
*HereDoc| Adams, John
*HereDoc| Washington,George
*HereDoc| Lincoln,Abraham
*HereDoc| Jefferson, Thomas
*HereDoc| |]
*HereDoc| :}
ghci> putStrLn s
Adams, John
Washington,George
Lincoln,Abraham
Jefferson, Thomas
ghci> :{
*HereDoc| putStr $ unlines $ map ((filter (`notElem` ", "))
*HereDoc| . snd . (break (==','))) $ lines s
*HereDoc| :}
John
George
Abraham
Thomas
getContents
== hGetContents stdin
. Unfortunately, hGetContents
marks its handle as (semi-)closed, which means anything attempting to read from stdin
ever again will fail.
Does it suffice to simply read up to an empty line or some other marker, never closing stdin
?
takeWhileM :: Monad m => (a -> Bool) -> [m a] -> m [a]
takeWhileM p (ma : mas) = do
a <- ma
if p a
then liftM (a :) $ takeWhileM p mas
else return []
takeWhileM _ _ = return []
ghci> liftM unlines $ takeWhileM (not . null) (repeat getLine) Adams, John Washington, George Lincoln, Abraham Jefferson, Thomas "Adams, John\nWashington, George\nLincoln, Abraham\nJefferson, Thomas\n" ghci>
If you do this a lot, and you're writing helper functions in some module anyway, why not go the whole hog and use your editor for the raw data too:
{-# LANGUAGE TemplateHaskell, QuasiQuotes #-}
module ParseAdHoc where
import HereDoc
import Data.Char (isSpace)
import Data.List (intercalate,intersperse) -- other handy helpers
-- ------------------------------------------------------
-- edit this bit every time you do your ad-hoc parsing
adhoc :: String -> String
adhoc = head . splitOn ',' . rmspace
input = [heredoc|
Adams, John
Washington,George
Lincoln,Abraham
Jefferson, Thomas
|]
-- ------------------------------------------------------
-- add other helpers you'll reuse here
main = mapM_ putStrLn.map adhoc.lines $ input
rmspace = filter (not.isSpace)
splitWith :: (a -> Bool) -> [a] -> [[a]] -- splits using a function that tells you when
splitWith isSplitter list = case dropWhile isSplitter list of
[] -> []
thisbit -> firstchunk : splitWith isSplitter therest
where (firstchunk, therest) = break isSplitter thisbit
splitOn :: Eq a => a -> [a] -> [[a]] -- splits on the given item
splitOn c = splitWith (== c)
splitsOn :: Eq a => [a] -> [a] -> [[a]] -- splits on any of the given items
splitsOn chars = splitWith (`elem` chars)
It would be easier to use takeWhile (/=',')
instead of head . splitOn ','
, but I thought that splitOn
will be more useful to you in the future.
This uses a helper module, HereDoc, that lets you paste multiline string literals into your code (like perl's <<"EOF"
or python's """
). I can't remember how I found how to do this, but I've tweaked it to remove whitespace first and last lines, so I can start and end my data with a newline.
module HereDoc where
import Language.Haskell.TH
import Language.Haskell.TH.Quote
import Data.Char (isSpace)
{-
example1 = [heredoc|Hi.
This is a multi-line string.
It should appear as an ordinary string literal.
Remember you can only use a QuasiQuoter
in a different module, so import this HereDoc module
into something else and don't forget the
{-# LANGUAGE TemplateHaskell, QuasiQuotes #-}|]
example2 = [heredoc|
This heredoc has no newline characters in it because empty or whitespace-only first and last lines are ignored
|]
-}
heredoc = QuasiQuoter {quoteExp = stringE.topAndTail,
quotePat = litP . stringL,
quoteType = undefined,
quoteDec = undefined}
topAndTail = myunlines.tidyend.tidyfront.lines
tidyfront :: [String] -> [String]
tidyfront [] = []
tidyfront (xs:xss) | all isSpace xs = xss
| otherwise = xs:xss
tidyend :: [String] -> [String]
tidyend [] = []
tidyend [xs] | all isSpace xs = []
| otherwise = [xs]
tidyend (xs:xss) = xs:tidyend xss
myunlines :: [String] -> String
myunlines [] = ""
myunlines (l:ls) = l ++ concatMap ('\n':) ls
You might find Data.Text a good source of (inspiration for) helper functions: http://hackage.haskell.org/packages/archive/text/latest/doc/html/Data-Text.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With