Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I write a parser using Parsec that only accepts unique elements?

I have recently started learning Haskell and have been trying my hand at Parsec. However, for the past couple of days I have been stuck with a problem that I have been unable to find the solution to. So what I am trying to do is write a parser that can parse a string like this:

<"apple", "pear", "pineapple", "orange">

The code that I wrote to do that is:

collection :: Parser [String]    
collection = (char '<') *> (string `sepBy` char ',')) <* (char '>')

string :: Parser String
string = char '"' *> (many (noneOf ['\"', '\r', '\n', '"'])) <* char '"'

This works fine for me as it is able to parse the string that I have defined above. Nevertheless, I would now like to enforce the rule that every element in this collection must be unique and that is where I am having trouble. One of the first results I found when searching on the internet was this one, which suggest the usage of the nub function. Although the problem stated in that question is not the same, it would in theory solve my problem. But what I don't understand is how I can apply this function within a Parser. I have tried adding the nub function to several parts of the code above without any success. Later I also tried doing it the following way:

 collection :: Parser [String]
 collection = do
  char '<'
  value <- (string `sepBy` char ','))
  char '>'
  return nub value

But this does not work as the type does not match what nub is expecting, which I believe is one of the problems I am struggling with. I am also not entirely sure whether nub is the right way to go. My fear is that I am going in the wrong direction and that I won't be able to solve my problem like this. Is there perhaps something I am missing? Any advice or help anyone could provide would be greatly appreciated.

like image 652
Dexter Avatar asked Aug 23 '15 15:08

Dexter


1 Answers

The Parsec Parser type is an instance of MonadPlus which means that we can always fail (ie cause a parse error) whenever we want. A handy function for this is guard:

guard :: MonadPlus m => Bool -> m ()

This function takes a boolean. If it's true, it return () and the whole computation (a parse in this case) does not fail. If it's false, the whole thing fails.

So, as long as you don't care about efficiency, here's a reasonable approach: parse the whole list, check for whether all the elements are unique and fail if they aren't.

To do this, the first thing we have to do is write a predicate that checks if every element of a list is unique. nub does not quite do the right thing: it return a list with all the duplicates taken out. But if we don't care much about performance, we can use it to check:

allUnique ls = length (nub ls) == length ls

With this predicate in hand, we can write a function unique that wraps any parser that produces a list and ensures that list is unique:

unique parser = do res <- parser
                   guard (allUnique res)
                   return res

Again, if guard is give True, it doesn't affect the rest of the parse. But if it's given False, it will cause an error.

Here's how we could use it:

λ> parse (unique collection) "<interactive>" "<\"apple\",\"pear\",\"pineapple\",\"orange\">"
Right ["apple","pear","pineapple","orange"]
λ> parse (unique collection) "<interactive>" "<\"apple\",\"pear\",\"pineapple\",\"orange\",\"apple\">"
Left "<interactive>" (line 1, column 46):unknown parse error

This does what you want. However, there's a problem: there is no error message supplied. That's not very user friendly! Happily, we can fix this using <?>. This is an operator provided by Parsec that lets us set the error message of a parser.

unique parser = do res <- parser
                   guard (allUnique res) <?> "unique elements"
                   return res

Ahhh, much better:

λ> parse (unique collection) "<interactive>" "<\"apple\",\"pear\",\"pineapple\",\"orange\",\"apple\">"
Left "<interactive>" (line 1, column 46):
expecting unique elements

All this works but, again, it's worth noting that it isn't efficient. It parses the whole list before realizing elements aren't unique, and nub takes quadratic time. However, this works and it's probably more than good enough for parsing small to medium-sized files: ie most things written by hand rather than autogenerated.

like image 164
Tikhon Jelvis Avatar answered Nov 13 '22 19:11

Tikhon Jelvis