Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Handling unicode directory and filenames in Perl on Windows

I have an encoding problem with Perl and Windows. On a Windows 7 running Perl (strawberry 5.16) and a simple TK GUI I need to open files and/or access directories with non-english characters in their name/path. For opening files I've come out with this solution which seems to work fine:

#!/usr/bin/perl -w

use strict;
use warnings;
use Win32::Unicode::File;
use Encode;
use Tk;

my $mw = Tk::MainWindow->new;
my $tissue_but = $mw->Button(
    -text => 'Open file',
    -command =>  [ \&select_unicode_file ],
);
$tissue_but->grid( -row => 3, -column => 1 );
Tk::MainLoop();

sub select_unicode_file{
my $types = [ ['Txt', '.txt'],
          ['All Files',   '*'],];
my $input_file= $mw->getOpenFile(-filetypes => $types);
my $fh = Win32::Unicode::File->new;
if ($fh->open('<', $input_file)){
  while (my $line = $fh->readline()){
    print "\n$line\n";
  }
   close $fh;
}
 else{
  print "Couldn't open file: $!\n";
}
}

This correctly opens files such as Поиск/Поиск.txt

What I CANNOT do is to simply get a directory path and than process it. I think I should use use Win32::Unicode::Dir but I really can't understand the documentation.

It should be something like this:

#!/usr/bin/perl -w

use strict;
use warnings;
use Win32::Unicode::Dir;
use Encode;
use Tk;

my $mw = Tk::MainWindow->new;
my $tissue_but = $mw->Button(
    -text => 'Open file',
    -command =>  [ \&select_unicode_directory ],
);
$tissue_but->grid( -row => 3, -column => 1 );
Tk::MainLoop();

sub select_unicode_directory{
my $dir = $mw->chooseDirectory( );
my $wdir = Win32::Unicode::Dir->new;
my $dir = $wdir->open($dir) || die $wdir->error;
my $dir_complete = "$dir/a.txt";
open (MYFILE, $dir_complete );
    while (<MYFILE>) {
    chomp;
    print "$_\n";
}
close (MYFILE); 
}
like image 796
Kelly o'Brian Avatar asked Jun 28 '13 17:06

Kelly o'Brian


1 Answers

There is a logical error in:

my $dir = $wdir->open($dir) || die $wdir->error;
my $dir_complete = "$dir/a.txt";

$wdir->open('path') returns an object, not a string. You can't use it like a path. But that is not the worst of it. Sadly, it seems like the Tk implementation does not yet have support for Unicode file names (including chooseDirectory). I guess you will have to write a custom dir selector, but I'm not sure it's even possible.

This is capable of listing files in an ascii-chars folder (and ->fetch can list utf-8 files), and crashes when opening a folder with utf-8 chars. Well, to be fair it crashes when opening ??????.

use strict;
use warnings;
use Win32::Unicode::Dir;
use Win32::Unicode::Console;
use Encode;
use Tk;

my $mw = Tk::MainWindow->new;
my $tissue_but = $mw->Button(
    -text => 'Select dir',
    -command =>  [ \&select_unicode_directory ],
);
$tissue_but->grid( -row => 3, -column => 1 );
Tk::MainLoop();

sub select_unicode_directory {
    my $wdir = Win32::Unicode::Dir->new;
    my $selected = $mw->chooseDirectory(-parent =>$mw);
       # http://search.cpan.org/dist/Tk/pod/chooseDirectory.pod#CAVEATS
       $selected = encode("utf-8", $selected);
    print "selected: $selected\n";

    $wdir->open($selected) || die $wdir->error;

    print "\$mw->chooseDirectory:    $selected\n";
    print "\$wdir->open(\$selected): $wdir\n";


# CRASH HERE, presumably because winders can't handle '?' in a file (dir) name
    for ($wdir->fetch) {
# http://search.cpan.org/~xaicron/Win32-Unicode-0.38/lib/Win32/Unicode/Dir.pm
        next if /^\.{1,2}$/;
        my $path = "$selected/$_";
        if (file_type('f', $path)) { print "file: $path\n"; } 
        elsif (file_type('d', $path)) { print " dir: $path\n"; }
    }
    print "closing \n";
    $wdir->close || die $wdir->error;

}

Sample out (opening Поиск/):

Both samples below were run using: Strawberry Perl 5.12.3 built for MSWin32-x64-multi-thread

selected: C:/cygwin/home/jaroslav/tmp/so/perl/open-file-tk/?????
$mw->chooseDirectory:    C:/cygwin/home/jaroslav/tmp/so/perl/open-file-tk/?????
$wdir->open($selected): Win32::Unicode::Dir=HASH(0x2e38158)
>>> perl crash <<<

Sample out (opening parent of Поиск):

selected: C:/cygwin/home/jaroslav/tmp/so/perl/open-file-tk
$mw->chooseDirectory:    C:/cygwin/home/jaroslav/tmp/so/perl/open-file-tk
$wdir->open($selected): Win32::Unicode::Dir=HASH(0x2b92c10)
file: C:/cygwin/home/jaroslav/tmp/so/perl/open-file-tk/.select_uni_dir.pl.swp
file: C:/cygwin/home/jaroslav/tmp/so/perl/open-file-tk/o
file: C:/cygwin/home/jaroslav/tmp/so/perl/open-file-tk/o.dir
file: C:/cygwin/home/jaroslav/tmp/so/perl/open-file-tk/select_uni_dir.pl
file: C:/cygwin/home/jaroslav/tmp/so/perl/open-file-tk/select_uni_file.pl
 dir: C:/cygwin/home/jaroslav/tmp/so/perl/open-file-tk/Поиск

Conclusion

The Tk dir selector returns ????? instead of Поиск. While looking for a way to enable Unicode in Tk, I found this:

http://search.cpan.org/dist/Tk/pod/UserGuide.pod#Perl/Tk_and_Unicode :

(...) Unfortunately, there are still places in Perl ignorant of Unicode. One of these places are filenames. Consequently, the file selectors in Perl/Tk do not handle encoding of filenames properly. Currently they suppose that filenames are in iso-8859-1 encoding, at least on Unix systems. As soon as Perl has a concept of filename encodings, then Perl/Tk will also implement such schemes.

So at first glance it seems what you're trying to do is impossible (unless you write or find a custom dir-selector). Actually, it may not be a bad idea to submit this bug, because the GUI did show "Поиск" so the error is in the return value.

like image 115
Ярослав Рахматуллин Avatar answered Sep 18 '22 19:09

Ярослав Рахматуллин