When I run the following code, it does not enter the "do something here" section: <pre class="prettyprint"><code>my $a ='µ╫P[┐╬&clubs;3▀═<+·1╪מ└╖"ª'; my $b ='µ╫P[┐╬&clubs;3▀═<+·1╪מ└╖"ª'; if ($a ne $b) { # do something here } </code></pre> Is there another way to compare Unicode strings with perl?

If you have two Unicode strings (i.e. string of Unicode code points), then you have surely saved your file as UTF-8 and you actually had <pre class="prettyprint"><code>use utf8; # Tell Perl source code is UTF-8. my $a = 'µ╫P[┐╬&clubs;3▀═<+·1╪מ└╖"ª'; my $b = 'µ╫P[┐╬&clubs;3▀═<+·1╪מ└╖"ª'; if ($a eq $b) { print("They're equal.\n"); } else { print("They're not equal.\n"); } </code></pre> And that works perfectly fine. <code>eq</code> and <code>ne</code> will compare the strings code point by code point. Certain graphemes (e.g. "é") can be built multiple different ways, so you might have to normalize their representation first. <pre class="prettyprint"><code>use utf8; # Tell Perl source code is UTF-8. use charnames qw( :full ); # For \N{} use Unicode::Normalize qw( NFC ); my $a = NFC("\N{LATIN SMALL LETTER E WITH ACUTE}"); my $b = NFC("e\N{COMBINING ACUTE ACCENT}"); if ($a eq $b) { print("They're equal.\n"); } else { print("They're not equal.\n"); } </code></pre> Finally, Unicode considers certain characters almost equivalent, and they can be considered equal using a different form of normalization. <pre class="prettyprint"><code>use utf8; # Tell Perl source code is UTF-8. use charnames qw( :full ); # For \N{} use Unicode::Normalize qw( NFKC ); my $a = NFKC("2"); my $b = NFKC("\N{SUPERSCRIPT TWO}"); if ($a eq $b) { print("They're equal.\n"); } else { print("They're not equal.\n"); } </code></pre>

Comparing two Unicode strings with perl

Tags:

unicode

perl

When I run the following code, it does not enter the "do something here" section:

my $a ='µ╫P[┐╬♣3▀═<+·1╪מ└╖"ª';
my $b ='µ╫P[┐╬♣3▀═<+·1╪מ└╖"ª';

if ($a ne $b) {
    # do something here    
}

Is there another way to compare Unicode strings with perl?

442

asked Mar 05 '12 21:03

smith

1 Answers

If you have two Unicode strings (i.e. string of Unicode code points), then you have surely saved your file as UTF-8 and you actually had

use utf8;  # Tell Perl source code is UTF-8.

my $a = 'µ╫P[┐╬♣3▀═<+·1╪מ└╖"ª';
my $b = 'µ╫P[┐╬♣3▀═<+·1╪מ└╖"ª';

if ($a eq $b) {
    print("They're equal.\n");
} else {
    print("They're not equal.\n");
}

And that works perfectly fine. eq and ne will compare the strings code point by code point.

Certain graphemes (e.g. "é") can be built multiple different ways, so you might have to normalize their representation first.

use utf8;  # Tell Perl source code is UTF-8.

use charnames          qw( :full );  # For \N{}
use Unicode::Normalize qw( NFC );

my $a = NFC("\N{LATIN SMALL LETTER E WITH ACUTE}");
my $b = NFC("e\N{COMBINING ACUTE ACCENT}");

if ($a eq $b) {
    print("They're equal.\n");
} else {
    print("They're not equal.\n");
}

Finally, Unicode considers certain characters almost equivalent, and they can be considered equal using a different form of normalization.

use utf8;  # Tell Perl source code is UTF-8.

use charnames          qw( :full );  # For \N{}
use Unicode::Normalize qw( NFKC );

my $a = NFKC("2");
my $b = NFKC("\N{SUPERSCRIPT TWO}");

if ($a eq $b) {
    print("They're equal.\n");
} else {
    print("They're not equal.\n");
}

183

answered Nov 22 '22 21:11

ikegami

Related questions
                            
                                What is the 'right' way of deleting array elements in Perl?
                            
                                DBIx::Class::ResultSet problems
                            
                                Performing math operations on very large numbers in Perl
                            
                                Is there any function in Perl similar to GetType() in C#? [duplicate]
                            
                                Perl IPv6 address expansion/parsing
                            
                                Perl regexp matching for strings with special characters
                            
                                Modification of a read-only value attempted
                            
                                Can you explain the bits I'm getting from unpack?
                            
                                optional regex match?
                            
                                Website log-in with Perl and Mechanize
                            
                                Does there exist a way by which i can use WMI classes in java
                            
                                Perl Mojolicious - How to make it handle multiple connections at once?
                            
                                Why does Perl print a value I don't expect after incrementing?
                            
                                Need help splitting this string of names (first name and last name pairs delimited by commas and "and")
                            
                                What does this perl line from a "bleached" file do?
                            
                                Installing perl module: cpan shell vs rpm/deb
                            
                                set exit status if perl -cw emits warning
                            
                                What's the search order in perl's include path when a module is loaded
                            
                                execute perl in command line without specifying perl in UNIX
                            
                                Is there a Python equivalent for Perl's `study`?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With