When is white space really important in Perl6 grammars?

Tags:

can someone clarify when white space is significant in rules in Perl 6 grammars? I am learning some by trial and error, but can't seem to find the actual rules in the documentation.

Example 1:

rule number {
    <pm> \d '.'? \d*[ <pm> \d* ]?
}

rule pm {
    [ '+' || '-' ]?
}

Will match a number 2.68156e+154, and not care about the spaces that are present in rule number. However, if I add a space after \d*, it will fail. (i.e. <pm> \d '.'? \d* [ <pm> \d* ]? fails).

Example 2: If I am trying to find literals in the middle of a word, then spacing around them are important. I.e., in finding the entry Double_t Delta_phi_R_1_9_pTproj_13_dat_cent_fx3001[52] = {

grammar TOP {
    ^ .*? <word-to-find> .* ?
}
rule word-to-find {
    \w*?fx\w*
}

Will find the word. However, if the definition of the rule word-to-find is changed to : fx or \w* fx\w* or \w*fx \w* then it won't make a match.

Also, then definition '[52]' will match, while the definition 'fx[52]' will not.

Thanks for any insight. A pointer to the proper point in the documentation would help greatly! Thanks,

332

asked Feb 20 '18 18:02

dave

2 Answers

In a rule, whitespace is turned into a <.ws> (that is, a non-capturing call to the ws token) except:

At the start of the rule, before the first atom
At the start of a [ (group) or ( (positional capture)
After ||, |, and &
After a variable declaration (:my $x = 'foo';)
After a code block
After the % operator for introducing a separator
After the ~ goal-matching operator
After an internal modifier (such as :i)
Inside of a construct like $<var> = x

Or, probably easier to remember, it will be inserted after any construct that could match some characters and after any zero-width assertion.

An important design goal in these rules is to never insert <.ws> somewhere that impedes Longest Token Matching. For example, consider rule foo:sym<ba> { [ bar | baz ] }, which is equivalent to token foo:sym<ba> { [ bar <.ws> | baz <.ws> ] <.ws> }. The default ws implementation is non-declarative (thanks to its use of <!ww>), meaning that it would break longest token matching both at the protoregex level were it inserted at the start of the rule, or at the alternation level were it inserted at the start of the group or after |.

Note that these rules only apply to rule, not to token and regex. They can be switched on at any point using :s and switched off using :!s in any of those, however (rule really just means "pretend there's a :s at the start").

Finally, the ws rule (which defaults to token ws { <!ww> \s* }) can be overridden in a grammar to define what whitespace means in the language being parsed.

answered Nov 07 '22 15:11

Jonathan Worthington

can someone clarify when white space is significant in rules in Perl 6 grammars?

When :sigspace is in effect.

I'll provide a little more detail below. If you or anyone else reading this needs further details, let me know via comments and I'll expand further.

First, let's eliminate one possible source of confusion, namely the meaning of the words rule and regex in the context of Perl 6, before I provide the doc link.

The word rule may be used in either a generic sense ("the regular expression, string matching and general-purpose parsing facility of Perl 6") or as a keyword (rule). Similarly, regex may be used to mean much the same thing as the generic rule or as a keyword (regex).

With that preamble out of the way, here's a link to the :sigspace doc section.

Note that the rule keyword implicitly inserts a :sigspace such that it takes effect immediately following the first atom in the declared rule, and that the effect is lexical. See @smls's answer to another SO question, especially the first two bullet points, for detailed discussion of these two important details.

You may also find my answer to another SO question dealing with whitespace/tokenization helpful.

Hth.

answered Nov 07 '22 14:11

raiph

Related questions
                            
                                How to remove &nbsp; from the end of spans with a given class?
                            
                                Extra whitespace in HTML values rendered with Jade
                            
                                Is white space allowed betwee mime header field-name and ':' separator
                            
                                How do I generate all possible combinations of a string with spaces between the characters? Python
                            
                                Benefits of stripping trailing white spaces?
                            
                                regex to match trailing whitespace, but not lines which are entirely whitespace (indent placeholders)
                            
                                JSF trimming white spaces
                            
                                Why does Fortran output have a leading space?
                            
                                How can I use diff to see whitespace changes?
                            
                                Why isn't my git pre-commit hook trimming white space from the end of lines?
                            
                                Newlines between HTML element attributes?
                            
                                Trim leading/trailing whitespace from textarea using jQuery?
                            
                                How to strip whitespace in string in TCL?
                            
                                Removing white space in a table
                            
                                Powershell Parsing Help - How to output a list of folder names into a text file
                            
                                Splitting string and removing whitespace Python
                            
                                how to remove white space in justified css
                            
                                How do you handle white space in your HTML [closed]
                            
                                Regular expression any character but a white space
                            
                                CSS ">" vs " > "?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

When is white space really important in Perl6 grammars?

Tags:

whitespace

grammar

raku

dave

People also ask

2 Answers

Jonathan Worthington

raiph

Recent Activity

Donate For Us