I'm looking at this library, which has little documentation: https://pythonhosted.org/parsec/#examples
I understand there are alternatives, but I'd like to use this library.
I have the following string I'd like to parse:
mystr = """
<kv>
  key1: "string"
  key2: 1.00005
  key3: [1,2,3]
</kv>
<csv>
date,windspeed,direction
20190805,22,NNW
20190805,23,NW
20190805,20,NE
</csv>"""
While I'd like to parse the whole thing, I'd settle for just grabbing the <tags>. I have:
>>> import parsec
>>> tag_start = parsec.Parser(lambda x: x == "<")
>>> tag_end = parsec.Parser(lambda x: x == ">")
>>> tag_name = parsec.Parser(parsec.Parser.compose(parsec.many1, parsec.letter))
>>> tag_open = parsec.Parser(parsec.Parser.joint(tag_start, tag_name, tag_end))
OK, looks good. Now to use it:
>>> tag_open.parse(mystr)
Traceback (most recent call last):
...
TypeError: <lambda>() takes 1 positional argument but 2 were given
This fails. I'm afraid I don't even understand what it meant about my lambda expression giving two arguments, it's clearly 1. How can I proceed?
My optimal desired output for all the bonus points is:
[
{"type": "tag", 
 "name" : "kv",
 "values"  : [
    {"key1" : "string"},
    {"key2" : 1.00005},
    {"key3" : [1,2,3]}
  ]
},
{"type" : "tag",
"name" : "csv", 
"values" : [
    {"date" : 20190805, "windspeed" : 22, "direction": "NNW"}
    {"date" : 20190805, "windspeed" : 23, "direction": "NW"}
    {"date" : 20190805, "windspeed" : 20, "direction": "NE"}
  ]
}
The output I'd settle for understanding in this question is using functions like those described above for start and end tags to generate:
[
  {"tag": "kv"},
  {"tag" : "csv"}
]
And simply be able to parse arbitrary xml-like tags out of the messy mixed text entry.
According to the tests, the proper way to parse your string would be the following:
from parsec import *
possible_chars = letter() | space() |  one_of('/.,:"[]') | digit()
parser =  many(many(possible_chars) + string("<") >> mark(many(possible_chars)) << string(">"))
parser.parse(mystr)
# [((1, 1), ['k', 'v'], (1, 3)), ((5, 1), ['/', 'k', 'v'], (5, 4)), ((6, 1), ['c', 's', 'v'], (6, 4)), ((11, 1), ['/', 'c', 's', 'v'], (11, 5))]
The construction of the parser:
For the sake of convenience, we first define the characters we wish to match. parsec provides many types:
letter(): matches any alphabetic character,
string(str): matches any specified string str,
space(): matches any whitespace character,
spaces(): matches multiple whitespace characters,
digit(): matches any digit,
eof(): matches EOF flag of a string,
regex(pattern): matches a provided regex pattern,
one_of(str): matches any character from the provided string,
none_of(str): match characters which are not in the provided string.
We can separate them with operators, according to the docs:
|: This combinator implements choice. The parser p | q first applies p.
    If it succeeds, the value of p is returned.
    If p fails without consuming any input, parser q is tried.
    NOTICE: without backtrack,
+: Joint two or more parsers into one. Return the aggregate of two results
    from this two parser.
^: Choice with backtrack. This combinator is used whenever arbitrary
    look ahead is needed. The parser p || q first applies p, if it success,
    the value of p is returned. If p fails, it pretends that it hasn't consumed
    any input, and then parser q is tried.
<<: Ends with a specified parser, and at the end parser consumed the
    end flag,
<: Ends with a specified parser, and at the end parser hasn't consumed
    any input,
>>: Sequentially compose two actions, discarding any value produced
    by the first,
mark(p): Marks the line and column information of the result of the parser p.
Then there are multiple "combinators":
times(p, mint, maxt=None): Repeats parser p from mint to maxt times,
count(p,n): Repeats parser p n-times. If n is smaller or equal to zero, the parser equals to return empty list,
(p, default_value=None): Make a parser optional. If success, return the result, otherwise return default_value silently, without raising any exception. If default_value is not provided None is returned instead,
many(p): Repeat parser p from never to infinitely many times,
many1(p): Repeat parser p at least once,
separated(p, sep, mint, maxt=None, end=None): ,
sepBy(p, sep): parses zero or more occurrences of parser p, separated by delimiter sep,
sepBy1(p, sep): parses at least one occurrence of parser p, separated by delimiter sep,
endBy(p, sep): parses zero or more occurrences of p, separated and ended by sep,
endBy1(p, sep): parses at least one occurrence of p, separated and ended by sep,
sepEndBy(p, sep): parses zero or more occurrences of p, separated and optionally ended by sep, 
sepEndBy1(p, sep): parses at least one occurrence of p, separated and optionally ended by sep.
Using all of that, we have a parser which matches many occurrences of many possible_chars, followed by a <, then we mark the many occurrences of possible_chars up until >.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With