Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Correct and portable utf8 filename normalization

Have another perl/utf8 question:

Code:

use 5.012;
use utf8;
use strict;
use warnings;
use feature qw(unicode_strings);

use open qw(:std :utf8);
use Encode qw(encode decode);
use charnames qw(:full);
use Unicode::Normalize qw(NFD NFC);

my $name = "\N{U+00C1}";        # Á (UPPERCASE A WITH ACUTE)

opendir(my $dh, ".") || die "error opendir";
while(readdir $dh) {
    say "ENC-OK" if      decode('UTF-8', $_)   =~ $name; #never true
    say "NFC-OK" if NFC( decode('UTF-8', $_) ) =~ $name; #true
}
closedir $dh;

The above code will print NFC-OK for every file what contain Á in the filename. But will never print ENC-OK, on NFD encoded filesystem, because the opendir never return Á in the form \x00C1, but "A", "accent"...

Question: how to correctly write the above code poratble for any OS?

like image 526
cajwine Avatar asked Jun 01 '12 20:06

cajwine


Video Answer


1 Answers

More specifically,

NFC( decode('UTF-8', $_) ) =~ quotemeta( NFC( $name ) )

and

NFD( decode('UTF-8', $_) ) =~ quotemeta( NFD( $name ) )

works for every file name reguardless of its form.

...Well, as long as it's UTF-8 encoded. Thatt won't be the case on Windows except maybe when using chcp 65001.

like image 66
ikegami Avatar answered Sep 21 '22 14:09

ikegami