Default definition of white space method ws in a grammar

Tags:

raku

According to the documentation the default definition of the ws method in a grammar is to match zero or more whitespace characters, as long as that point is not within a word:

regex ws { <!ww> \s* }

What is the difference between this definition and the following:

regex ws { \s+ }

I wonder why the zero width assertion <!ww> is used instead of the simpler \s+? I also note that the default definition allows to match zero white spaces, but when would that actually happen? Wouldn't it be more clear if it used \s+ instead of \s*?

260

asked Apr 01 '19 19:04

Håkon Hægland

1 Answers

The ww assertion means that there are chars matching \w either side of the current point. The ! inverts it, meaning <!ww> matches:

At the start of the string
At the end of the string
When there's a non-\w character before the current position (such as between "+" and "a")
When there's a non-\w character after the current position (such as between "a" and "+")

Effectively, then, it means that whitespace can never be considered to occur between two word characters. However, between non-word characters, or between a word character and a non-word character, then there can be considered whitespace.

This follows what many languages we might wish to parse need. For example, consider ab+cd. The default ws will match either side of the +, but would not, for example, match within an identifier.

For languages where that isn't suitable, it's simply a matter of overriding the default ws for whatever that language needs.

143

answered Sep 25 '22 21:09

Jonathan Worthington

Related questions
                            
                                ANTLR: Difference between backtrack and look-ahead?
                            
                                Parsing python with PLY, how to code the indent and dedent part
                            
                                Python grammar: with_stmt
                            
                                Difference in capturing and non-capturing regex scope in Perl 6 / Raku
                            
                                Grammar Syntax and Linguistics
                            
                                how do i throw exceptions with meaningful messages with a scala combination parser?
                            
                                Question - formal language in prolog
                            
                                What does ~ mean inside a Grammar (in Perl 6)?
                            
                                Are regular expressions (regex) really regular?
                            
                                Grammar.parse seems to loop forever and use 100% CPU
                            
                                How can you write a customizable grammar?
                            
                                Concatting a list of strings in Prolog
                            
                                In ANTLR, is there a shortcut notation for expressing alternation of all the permutations of some set of rules?
                            
                                Parsing SPARQL queries
                            
                                What type of grammar is used to parse PostgreSQL?
                            
                                Looking for examples of Jison grammars that use indentation for block-structure
                            
                                Why do I need brackets here? Java: "if (true) int i=0;"
                            
                                BibTex grammar for ANTLR
                            
                                Perl 6 Grammar doesn't match like I think it should
                            
                                Is the Python's grammar LL(1)?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With