my $book1 = "Don Quixote- Miguel de Cervantes";
my $book2 = "Les Misérables -Victor Hugo";
my $book3 = "War and Peace - Leo Tolstoy";
I want to use .subst to change "- " to " - " in $book1 and " -" to " - " in $book2. The problem is that I can't find the right regex to use with .subst. I could to use something different to a regex but I would like to use .subst. I can use different regexes for both strings but both should ignore the " - " in $book3.
Sorry for the probably basic question. I've been trying different things but I always destroy part of the text.
you can use trans method:
my $book1 = "Don Quixote- Miguel de Cervantes";
my $book2 = "Les Misérables -Victor Hugo";
my $book3 = "War and Peace - Leo Tolstoy";
for ($book1, $book2, $book3) -> $b {
say $b.trans([/<wb> '- '/, /' -' <wb>/] => [' - ']);
}
wb is a word boundary.
TL;DR Another option to consider is using the <( and )> capture markers to pick out just the bit you want to replace.
Matching strictly per your examples:
/ \C[space] <( '- ' | ' -' )> \C[space] /
The syntax \c[...] specifies one or more characters by using their Unicode names inside the square brackets (in this case the classic ASCII space character).1
In this pattern I've used \C[...] (uppercase C, not lowercase c). There is a range of Raku "backslash" atoms and they all have lowercase and uppercase variants, where the uppercase variant matches any character except the one(s) matched by the lowercase variant. So \C[space] matches any character other than the ASCII space character. See \c / \C for more info.
The <( capture marker marks the start point of the regex's capture. Likewise )> marks the endpoint.
Without them, when the pattern matches, the whole match would be captured, which would include whatever non whitespace character matches the \C[space] atom. We don't want that. So we use these markers to restrict what we capture.
Btw, each marker is independent. The above pattern matches \C[space] '- ' or '- ' \C[space]. If the pattern to the left of the | matches, only the <( has an impact, omitting whatever matched \C[space], and capturing until the end of the match, which for this pattern stops at the |. If the pattern to the right matches, capturing starts immediately after the | and ends at the )>.
The | is Raku's parallel (aka "longest token match" -- LTM) pattern alternation operator, an alternative to the traditional sequential pattern alternation operator (which in Raku is written ||). In this case the set of substrings that the two operators will and won't match is the same, so it makes no difference which is used. But | is shorter than ||; when the match set is the same it's typically faster; and when the match sets are different it's often | that's desirable. So I use it by default unless I know I need the traditional sequential alternation logic (try pattern on left of || first; if that fails, try the pattern on the right of the ||).
Matching more flexibly regarding whitespace:
/ \S <( '-' \s+ | \s+ '-' )> \S /
The \S atoms match any character that is not categorized by Unicode as being a whitespace character. (I use Raku, or tools such as this character property lookup web page, to explore what Unicode makes of a character.)
Comparing \C[space], \S, and <wb>:
\C[space] matches any character, including whitespace characters, with the sole exception of an ASCII space. My guess is it'll be the fastest of the three.
\S matches any non-whitespace. My guess is it'll be faster than <wb>.
<wb> matches between characters. Also it'll match before the first character in a string, and after the last one. So @chenyf's pattern would match and change '- foo...' to ' - foo...' and '...bar -' to '...bar - ' whereas the patterns with \C[space] or \S would not match at the start/end of those strings.
The \s+ atoms match one or more whitespace characters.
1 The naming is case insensitive. Multiple characters are separated by commas. \c[...] also works in a double quoted string (but not \C[...]).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With