Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsec start-of-row pattern?

I am trying to parse mediawiki text using Parsec. Some of the constructs in mediawiki markup can only occur at the start of rows (such as the header markup ==header level 2==). In regexp I would use an anchor (such as ^) to find the start of a line.

One attempt in GHCi is

Prelude Text.Parsec> parse (char '\n' *> string "==" *> many1 letter <* string "==") "" "\n==hej=="
Right "hej"

but this is not too good since it will fail on the first line of a file. I feel like this should be a solved problem...

What is the most idiomatic "Start of line" parsing in Parsec?

like image 705
LudvigH Avatar asked Sep 19 '25 17:09

LudvigH


1 Answers

You can use getPosition and sourceColumn in order to find out the column number that the parser is currently looking at. The column number will be 1 if the current position is at the start of a line (such as at the start of input or after a \n or \r character).

There isn't a built-in combinator for this, but you can easily make it:

import Text.Parsec
import Control.Monad (guard)

startOfLine :: Monad m => ParsecT s u m ()
startOfLine = do
    pos <- getPosition
    guard (sourceColumn pos == 1)

Now you can write your header parser as:

header = startOfLine *> string "==" *> many1 letter <* string "=="
like image 56
4castle Avatar answered Sep 21 '25 09:09

4castle