Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Correctly parsing line indentations in uu-parsinglib in Haskell

I want to create a parser combinator, which will collect all lines below current place, which indentation levels will be greater or equal some i. I think the idea is simple:

Consume a line - if its indentation is:

  • ok -> do it for next lines
  • wrong -> fail

Lets consider following code:

import qualified Text.ParserCombinators.UU as UU
import           Text.ParserCombinators.UU hiding(parse)
import           Text.ParserCombinators.UU.BasicInstances hiding (Parser)

-- end of line
pEOL   = pSym '\n'

pSpace = pSym ' '
pTab   = pSym '\t'

indentOf s = case s of
    ' '  -> 1
    '\t' -> 4

-- return the indentation level (number of spaces on the beginning of the line)
pIndent = (+) <$> (indentOf <$> (pSpace <|> pTab)) <*> pIndent `opt` 0

-- returns tuple of (indentation level, result of parsing the second argument)
pIndentLine p = (,) <$> pIndent <*> p <* pEOL

-- SHOULD collect all lines below witch indentations greater or equal i
myParse p i = do
    (lind, expr) <- pIndentLine p
    if lind < i
        then pFail
        else do
            rest <- myParse p i `opt` []
            return $ expr:rest

-- sample inputs
s1 = " a\
   \\n a\
   \\n"

s2 = " a\
   \\na\
   \\n"

-- execution
pProgram = myParse (pSym 'a') 1 

parse p s = UU.parse ( (,) <$> p <*> pEnd) (createStr (LineColPos 0 0 0) s)

main :: IO ()
main = do 
    print $ parse pProgram s1
    print $ parse pProgram s2
    return ()

Which gives following output:

("aa",[])
Test.hs: no correcting alternative found

The result for s1 is correct. The result for s2 should consume first "a" and stop consuming. Where this error comes from?

like image 458
Wojciech Danilo Avatar asked Aug 14 '13 16:08

Wojciech Danilo


1 Answers

The parsers which you are constructing will always try to proceed; if necessary input will be discarded or added. However pFail is a dead-end. It acts as a unit element for <|>.

In you parser there is however no other alternative present in case the input does not comply to the language recognised by the parser. In you specification you say you want the parser to fail on input s2. Now it fails with a message saying that is fails, and you are surprised.

Maybe you do not want it to fail, but you want to stop accepting further input? In that case replace pFail by return [].

Note that the text:

do
    rest <- myParse p i `opt` []
    return $ expr:rest

can be replaced by (expr:) <$> (myParse p i `opt` [])

A natural way to solve your problem is probably something like

pIndented p = do i <- pGetIndent
             (:) <$> p <* pEOL  <*> pMany (pToken (take i (repeat ' ')) *> p <* pEOL)

pIndent = length <$> pMany (pSym ' ')
like image 199
Doaitse Swierstra Avatar answered Sep 29 '22 14:09

Doaitse Swierstra