Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing data with Parsec and omitting comments

I am trying to write a Haksell Parsec Parser that parses input data from a file into the LogLine datatype as follows:

--Final parser that holds the indvidual parsers.
final :: Parser [LogLine]
final = do{ logLines <- sepBy1 logLine eol
        ; return logLines
        }


--The logline token declaration
logLine :: Parser LogLine
logLine = do
name <-  plainValue -- parse the name (identifier)
many1 space -- parse and throw away a space
args1 <- bracketedValue -- parse the first arguments
many1 space -- throw away the second sapce
args2 <- bracketedValue -- parse the second list of arguments
many1 space -- 
constant <- plainValue -- parse the constant identifier
space
weighting <- plainValue --parse the weighting double
space
return $ LogLine name args1 args2 constant weighting

It parses everything just fine, but now I need to add comments to the file, and I have to modify the parser so that it ignores them. It should support single-line comments only beginning with "--" and ending with a '\n' I've tried defining the comment token as follows:

comments :: Parser String
comments = do 
    string "--"
    comment <- (manyTill anyChar newline)
    return ""

And then plugging it into the final parser like so:

final :: Parser [LogLine]
final = do 
        optional comments
        logLines <- sepBy1 logLine (comments<|>newline)
        optional comments
        return logLines

It compiles fine, but it does not parse. I've tried several minor modifications but the best result was parsing everything up to the first comment, so I'm beginning to think that this is not the way to do it. PS: I've seen this Similar Question, but it is slightly different from what I'm trying to achieve.

like image 276
Atanas Bozhkov Avatar asked Oct 07 '22 06:10

Atanas Bozhkov


1 Answers

If I understand your description of the format in your comment correctly, your example for the format would be

name arg1 arg2 c1 weight
-- comment goes here

optionally followed by further log-lines and/or comments.

Then your problem is that there is a newline between the log-line and the comment line, which means that the comments part of the separator parser fails - comments must start with "--" - without consuming input, so newline is tried and succeeds. Then the next line begins with "--" which makes plainValue fail without consuming input, and thus ends the sepBy1.

The solution is to let the separator first consume a newline, and then as many comment lines as follow:

final = do
    skipMany comments
    sepEndBy1 logLine (newline >> skipMany comments)

by allowing the sequence to be ended by a separator (sepEndBy1 instead of sepBy1), any comment lines after the final LogLine are automatically skipped.

like image 100
Daniel Fischer Avatar answered Oct 10 '22 03:10

Daniel Fischer