I'm looking at this library, which has little documentation: https://pythonhosted.org/parsec/#examples
I understand there are alternatives, but I'd like to use this library.
I have the following string I'd like to parse:
mystr = """
<kv>
key1: "string"
key2: 1.00005
key3: [1,2,3]
</kv>
<csv>
date,windspeed,direction
20190805,22,NNW
20190805,23,NW
20190805,20,NE
</csv>"""
While I'd like to parse the whole thing, I'd settle for just grabbing the <tags>
. I have:
>>> import parsec
>>> tag_start = parsec.Parser(lambda x: x == "<")
>>> tag_end = parsec.Parser(lambda x: x == ">")
>>> tag_name = parsec.Parser(parsec.Parser.compose(parsec.many1, parsec.letter))
>>> tag_open = parsec.Parser(parsec.Parser.joint(tag_start, tag_name, tag_end))
OK, looks good. Now to use it:
>>> tag_open.parse(mystr)
Traceback (most recent call last):
...
TypeError: <lambda>() takes 1 positional argument but 2 were given
This fails. I'm afraid I don't even understand what it meant about my lambda expression giving two arguments, it's clearly 1. How can I proceed?
My optimal desired output for all the bonus points is:
[
{"type": "tag",
"name" : "kv",
"values" : [
{"key1" : "string"},
{"key2" : 1.00005},
{"key3" : [1,2,3]}
]
},
{"type" : "tag",
"name" : "csv",
"values" : [
{"date" : 20190805, "windspeed" : 22, "direction": "NNW"}
{"date" : 20190805, "windspeed" : 23, "direction": "NW"}
{"date" : 20190805, "windspeed" : 20, "direction": "NE"}
]
}
The output I'd settle for understanding in this question is using functions like those described above for start and end tags to generate:
[
{"tag": "kv"},
{"tag" : "csv"}
]
And simply be able to parse arbitrary xml-like tags out of the messy mixed text entry.
According to the tests, the proper way to parse your string would be the following:
from parsec import *
possible_chars = letter() | space() | one_of('/.,:"[]') | digit()
parser = many(many(possible_chars) + string("<") >> mark(many(possible_chars)) << string(">"))
parser.parse(mystr)
# [((1, 1), ['k', 'v'], (1, 3)), ((5, 1), ['/', 'k', 'v'], (5, 4)), ((6, 1), ['c', 's', 'v'], (6, 4)), ((11, 1), ['/', 'c', 's', 'v'], (11, 5))]
The construction of the parser
:
For the sake of convenience, we first define the characters we wish to match. parsec
provides many types:
letter()
: matches any alphabetic character,
string(str)
: matches any specified string str
,
space()
: matches any whitespace character,
spaces()
: matches multiple whitespace characters,
digit()
: matches any digit,
eof()
: matches EOF flag of a string,
regex(pattern)
: matches a provided regex pattern,
one_of(str)
: matches any character from the provided string,
none_of(str)
: match characters which are not in the provided string.
We can separate them with operators, according to the docs:
|
: This combinator implements choice. The parser p | q first applies p.
If it succeeds, the value of p is returned.
If p fails without consuming any input, parser q is tried.
NOTICE: without backtrack,
+
: Joint two or more parsers into one. Return the aggregate of two results
from this two parser.
^
: Choice with backtrack. This combinator is used whenever arbitrary
look ahead is needed. The parser p || q first applies p, if it success,
the value of p is returned. If p fails, it pretends that it hasn't consumed
any input, and then parser q is tried.
<<
: Ends with a specified parser, and at the end parser consumed the
end flag,
<
: Ends with a specified parser, and at the end parser hasn't consumed
any input,
>>
: Sequentially compose two actions, discarding any value produced
by the first,
mark(p)
: Marks the line and column information of the result of the parser p
.
Then there are multiple "combinators":
times(p, mint, maxt=None)
: Repeats parser p
from mint
to maxt
times,
count(p,n)
: Repeats parser p
n
-times. If n
is smaller or equal to zero, the parser equals to return empty list,
(p, default_value=None)
: Make a parser optional. If success, return the result, otherwise return default_value
silently, without raising any exception. If default_value
is not provided None
is returned instead,
many(p)
: Repeat parser p
from never to infinitely many times,
many1(p)
: Repeat parser p
at least once,
separated(p, sep, mint, maxt=None, end=None)
: ,
sepBy(p, sep)
: parses zero or more occurrences of parser p
, separated by delimiter sep
,
sepBy1(p, sep)
: parses at least one occurrence of parser p
, separated by delimiter sep
,
endBy(p, sep)
: parses zero or more occurrences of p
, separated and ended by sep
,
endBy1(p, sep)
: parses at least one occurrence of p
, separated and ended by sep
,
sepEndBy(p, sep)
: parses zero or more occurrences of p
, separated and optionally ended by sep
,
sepEndBy1(p, sep)
: parses at least one occurrence of p
, separated and optionally ended by sep
.
Using all of that, we have a parser which matches many occurrences of many possible_chars
, followed by a <
, then we mark the many occurrences of possible_chars
up until >
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With