Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Matching non-whitespace characters in Perl 6

Tags:

regex

raku

In Perl 6, you can use <.ws> to match non-whitespace characters. I want to match any character that doesn't match <.ws>, but I don't think I can use \S instead because I believe that only matches ASCII spaces while <.ws> will match any Unicode space. How do I do this?

like image 929
Kaiepi Avatar asked Apr 07 '19 23:04

Kaiepi


People also ask

What is whitespace character in Perl?

All types of whitespace like spaces, tabs, newlines, etc. are equivalent to the interpreter when they are used outside of the quotes. A line containing only whitespace, possibly with a comment, is known as a blank line, and Perl totally ignores it. Mohd Mohtashim. © Copyright 2022.

What is a non whitespace character in regex?

Non-word character: \W. Whitespace character: \s. Non-whitespace character: \S.


1 Answers

A usage of <.ws> is a call to the ws token that does not capture its result. Its default behavior is:

token ws { <!ww> \s* }

Which means that:

  1. We must not be between two word (\w) characters
  2. Assuming that is true, there are zero or more whitespace characters at this point

In a given grammar, that can be overridden to specify the "whitespace" of the current language. In the Perl 6 language grammar, for example, ws includes parsing of comments, Pod, and even heredocs!

By contrast, \s is the character class for matching a single whitespace character, and \S means "not a whitespace character". This definition is Unicode based; if we do:

say .uniname for (0..0x10FFFF).map(*.chr).grep(/\s/)

Then we get:

<control-0009>
<control-000A>
<control-000B>
<control-000C>
<control-000D>
SPACE
<control-0085>
NO-BREAK SPACE
OGHAM SPACE MARK
EN SPACE
EM SPACE
EN SPACE
EM SPACE
THREE-PER-EM SPACE
FOUR-PER-EM SPACE
SIX-PER-EM SPACE
FIGURE SPACE
PUNCTUATION SPACE
THIN SPACE
HAIR SPACE
LINE SEPARATOR
PARAGRAPH SEPARATOR
NARROW NO-BREAK SPACE
MEDIUM MATHEMATICAL SPACE
IDEOGRAPHIC SPACE

Therefore, most probably \S is that you are looking for.

like image 61
Jonathan Worthington Avatar answered Oct 10 '22 16:10

Jonathan Worthington