When I run the following code, it does not enter the "do something here" section:
my $a ='µ╫P[┐╬♣3▀═<+·1╪מ└╖"ª';
my $b ='µ╫P[┐╬♣3▀═<+·1╪מ└╖"ª';
if ($a ne $b) {
# do something here
}
Is there another way to compare Unicode strings with perl?
“Many options exist in PERL to compare string values. One way is to use the “cmp” operator, and another way is to use comparison operators, which are “eq,” “ne,” “lt.” and “gt.” The “==” operator is used for number comparison only in PERL.
'eq' operator in Perl is one of the string comparison operators used to check for the equality of the two strings. It is used to check if the string to its left is stringwise equal to the string to its right.
While Perl does not implement the Unicode standard or the accompanying technical reports from cover to cover, Perl does support many Unicode features. Also, the use of Unicode may present security issues that aren't obvious, see "Security Implications of Unicode" below.
"== does a numeric comparison: it converts both arguments to a number and then compares them."
If you have two Unicode strings (i.e. string of Unicode code points), then you have surely saved your file as UTF-8 and you actually had
use utf8; # Tell Perl source code is UTF-8.
my $a = 'µ╫P[┐╬♣3▀═<+·1╪מ└╖"ª';
my $b = 'µ╫P[┐╬♣3▀═<+·1╪מ└╖"ª';
if ($a eq $b) {
print("They're equal.\n");
} else {
print("They're not equal.\n");
}
And that works perfectly fine. eq
and ne
will compare the strings code point by code point.
Certain graphemes (e.g. "é") can be built multiple different ways, so you might have to normalize their representation first.
use utf8; # Tell Perl source code is UTF-8.
use charnames qw( :full ); # For \N{}
use Unicode::Normalize qw( NFC );
my $a = NFC("\N{LATIN SMALL LETTER E WITH ACUTE}");
my $b = NFC("e\N{COMBINING ACUTE ACCENT}");
if ($a eq $b) {
print("They're equal.\n");
} else {
print("They're not equal.\n");
}
Finally, Unicode considers certain characters almost equivalent, and they can be considered equal using a different form of normalization.
use utf8; # Tell Perl source code is UTF-8.
use charnames qw( :full ); # For \N{}
use Unicode::Normalize qw( NFKC );
my $a = NFKC("2");
my $b = NFKC("\N{SUPERSCRIPT TWO}");
if ($a eq $b) {
print("They're equal.\n");
} else {
print("They're not equal.\n");
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With