According to the documentation the default definition of the ws
method in a grammar is to match zero or more whitespace characters, as long as that point is not within a word:
regex ws { <!ww> \s* }
What is the difference between this definition and the following:
regex ws { \s+ }
I wonder why the zero width assertion <!ww>
is used instead of the simpler \s+
? I also note that the default definition allows to match zero white spaces, but when would that actually happen? Wouldn't it be more clear if it used \s+
instead of \s*
?
In computer programming, whitespace is any character or series of characters that represent horizontal or vertical space in typography. When rendered, a whitespace character does not correspond to a visible mark, but typically does occupy an area on a page.
Previously a non-space character was defined as anything but. a space (U+0020). Now it is anything that is not a whitespace.
The ww
assertion means that there are chars matching \w
either side of the current point. The !
inverts it, meaning <!ww>
matches:
\w
character before the current position (such as between "+" and "a")\w
character after the current position (such as between "a" and "+")Effectively, then, it means that whitespace can never be considered to occur between two word characters. However, between non-word characters, or between a word character and a non-word character, then there can be considered whitespace.
This follows what many languages we might wish to parse need. For example, consider ab+cd
. The default ws
will match either side of the +
, but would not, for example, match within an identifier.
For languages where that isn't suitable, it's simply a matter of overriding the default ws
for whatever that language needs.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With