Have this script:
use 5.014;
use warnings;
use utf8;
binmode STDOUT, ':utf8';
my $str = "XYZ ΦΨΩ zyz φψω";
my @greek = ($str =~ /\p{Greek}/g);
say "Greek: @greek";
my @upper = ($str =~ /\p{Upper}/g);
say "Upper: @upper";
#my @upper_greek = ($str =~ /\p{Upper+Greek}/); #wrong.
#say "Upper+Greek: @upper_greek";
Is possible combine multiple unicode properties? E.g how to select only Upper and Greek
, and get the wanted:
Greek: Φ Ψ Ω φ ψ ω
Upper: X Y Z Φ Ψ Ω
Upper+Greek: Φ Ψ Ω #<-- how to get this?
We want to perform an AND operation, so we can't use
/(?:\p{Greek}|\p{Upper})/ # Greek OR Upper
or
/[\p{Greek}\p{Upper}]/ # Greek OR Upper
Since 5.18, one can use regex sets.
/(?[ \p{Greek} & \p{Upper} ])/ # Greek AND Upper
This requires use experimental qw( regex_sets );
before 5.36. But it's safe to add this and use the feature as far back as its introduction as an experimental feature in 5.18, since no change was made to the feature since then.
There are some other approaches that can be used in older versions of Perl, but they are indisputably harder to read.
One way of achieving AND in a regex is using lookarounds.
/\p{Greek}(?<=\p{Upper})/ # Greek AND Upper
Another way of getting an AND is to negate an OR. De Morgan's laws tells us
NOT( Greek AND Upper ) ⇔ NOT(Greek) OR NOT(Upper)
so
Greek AND Upper ⇔ NOT( NOT(Greek) OR NOT(Upper) )
This gives us
/[^\P{Greek}\P{Upper}]/ # Greek AND Upper
This is more efficient then using a lookbehind.
This works in 5.14.0 as well:
sub InUpperGreek {
return <<'END'
+utf8::Greek
&utf8::Upper
END
}
my @upper_greek = ($str =~ /\p{InUpperGreek}/g);
say "Upper Greek: @upper_greek";
Not sure if that's simpler. :) For more information on how this works, see the perlunicode documentation on user-defined character properties.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With