Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Default definition of white space method ws in a grammar

Tags:

grammar

raku

According to the documentation the default definition of the ws method in a grammar is to match zero or more whitespace characters, as long as that point is not within a word:

regex ws { <!ww> \s* }

What is the difference between this definition and the following:

regex ws { \s+ }

I wonder why the zero width assertion <!ww> is used instead of the simpler \s+? I also note that the default definition allows to match zero white spaces, but when would that actually happen? Wouldn't it be more clear if it used \s+ instead of \s*?

like image 260
Håkon Hægland Avatar asked Apr 01 '19 19:04

Håkon Hægland


People also ask

Is whitespace a character?

In computer programming, whitespace is any character or series of characters that represent horizontal or vertical space in typography. When rendered, a whitespace character does not correspond to a visible mark, but typically does occupy an area on a page.

What is a non whitespace character?

Previously a non-space character was defined as anything but. a space (U+0020). Now it is anything that is not a whitespace.


1 Answers

The ww assertion means that there are chars matching \w either side of the current point. The ! inverts it, meaning <!ww> matches:

  • At the start of the string
  • At the end of the string
  • When there's a non-\w character before the current position (such as between "+" and "a")
  • When there's a non-\w character after the current position (such as between "a" and "+")

Effectively, then, it means that whitespace can never be considered to occur between two word characters. However, between non-word characters, or between a word character and a non-word character, then there can be considered whitespace.

This follows what many languages we might wish to parse need. For example, consider ab+cd. The default ws will match either side of the +, but would not, for example, match within an identifier.

For languages where that isn't suitable, it's simply a matter of overriding the default ws for whatever that language needs.

like image 143
Jonathan Worthington Avatar answered Sep 25 '22 21:09

Jonathan Worthington