I am starting to write BibTeX parser. The first thing I would like to do is to parse a braced item. A braced item could be an author field or a title for example. There might be nested braces within the field. The following code does not handle nested braces:
use v6;
my $str = q:to/END/;
author={Belayneh, M. and Geiger, S. and Matth{\"{a}}i, S.K.},
END
$str .= chomp;
grammar ExtractBraced {
rule TOP {
'author=' <braced-item> .*
}
rule braced-item { '{' <-[}]>* '}' }
}
ExtractBraced.parse( $str ).say;
Output:
「author={Belayneh, M. and Geiger, S. and Matth{\"{a}}i, S.K.},」
braced-item => 「{Belayneh, M. and Geiger, S. and Matth{\"{a}」
Now, in order to make the parser accept nested braces, I would like to keep a counter of the number of opening braces currently parsed and when encountering a closing brace, we decrement the counter. If the counter reaches zero, we assume that we have parsed the complete item.
To follow this idea, I tried to split up the braced-item
regex, to implement an grammar action on each char. (The action method on the braced-item-char
regex below should then handle the brace-counter):
grammar ExtractBraced {
rule TOP {
'author=' <braced-item> .*
}
rule braced-item { '{' <braced-item-char>* '}' }
rule braced-item-char { <-[}]> }
}
However, suddenly now the parsing fails. Probably a silly mistake, but I cannot see why it should fail now?
Without knowing how you want the resultant data to look I would change it to look something like this:
my $str = 「author={Belayneh, M. and Geiger, S. and Matth{\"{a}}i, S.K.},」;
grammar ExtractBraced {
token TOP {
'author='
$<author> = <.braced-item>
.*
}
token braced-item {
'{' ~ '}'
[
|| <- [{}] >+
|| <.before '{'> <.braced-item>
]*
}
}
ExtractBraced.parse( $str ).say;
「author={Belayneh, M. and Geiger, S. and Matth{\"{a}}i, S.K.},」
author => 「{Belayneh, M. and Geiger, S. and Matth{\"{a}}i, S.K.}」
If you want a bit more structure It might look a bit more like this:
my $str = 「author={Belayneh, M. and Geiger, S. and Matth{\"{a}}i, S.K.},」;
grammar ExtractBraced {
token TOP {
'author='
$<author> = <.braced-item>
.*
}
token braced-part {
|| <- [{}] >+
|| <.before '{'> <braced-item>
}
token braced-item {
'{' ~ '}'
<braced-part>*
}
}
class Print {
method TOP ($/){
make $<author>.made
}
method braced-part ($/){
make $<braced-item>.?made // ~$/
}
method braced-item ($/){
make [~] @<braced-part>».made
}
}
my $r = ExtractBraced.parse( $str, :actions(Print) );
say $r;
put();
say $r.made;
「author={Belayneh, M. and Geiger, S. and Matth{\"{a}}i, S.K.},」
author => 「{Belayneh, M. and Geiger, S. and Matth{\"{a}}i, S.K.}」
braced-part => 「Belayneh, M. and Geiger, S. and Matth」
braced-part => 「{\"{a}}」
braced-item => 「{\"{a}}」
braced-part => 「\"」
braced-part => 「{a}」
braced-item => 「{a}」
braced-part => 「a」
braced-part => 「i, S.K.」
Belayneh, M. and Geiger, S. and Matth\"ai, S.K.
Note that the +
on <-[{}]>+
is an optimization, as well as <before '{'>
, both can be omitted and it will still work.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With