For example, given an empty file テスト.txt
, how would I make a copy called テスト.txt.copy
?
My first crack at it managed to access the file and create the new filename, but the copy generated テスト.txt.copy
.
Here was my first crack at it:
#!/usr/bin/env perl
use strict;
use warnings;
use English '-no_match_vars';
use File::Basename;
use Getopt::Long;
use File::Copy;
use Win32;
my (
$output_relfilepath,
) = process_command_line();
open my $fh, '>', $output_relfilepath or die $!;
binmode $fh, ':utf8';
foreach my $short_basename ( glob( '*.txt') ) {
# skip the output basename if it's in the glob
if ( $short_basename eq $output_relfilepath ) {
next;
}
my $long_basename = Win32::GetLongPathName( $short_basename );
my $new_basename = $long_basename . '.copy';
print {$fh} sprintf(
"short_basename = (%s)\n" .
" long_basename = (%s)\n" .
" new_basename = (%s)\n",
$short_basename,
$long_basename,
$new_basename,
);
copy( $short_basename, $new_basename );
}
printf(
"\n%s done! (%d seconds elapsed)\n",
basename( $0 ),
time() - $BASETIME,
);
# === subroutines ===
sub process_command_line {
# default arguments
my %args
= (
output_relfilepath => 'output.txt',
);
GetOptions(
'help' => sub { print usage(); exit },
'output_relfilepath=s' => \$args{output_relfilepath},
);
return (
$args{output_relfilepath},
);
}
sub usage {
my $script_name = basename $0;
my $usage = <<END_USAGE;
======================================================================
Test script to copy files with a UTF-8 filenames to files with
different UTF-8 filenames. This example tries to make copies of all
.txt files with versions that end in .txt.copy.
usage: ${script_name} (<options>)
options:
-output_relfilepath <s> set the output relative file path to <s>.
this file contains the short, long, and
new basenames.
(default: 'output.txt')
----------------------------------------------------------------------
examples:
${script_name}
======================================================================
END_USAGE
return $usage;
}
Here are the contents of output.txt
after execution:
short_basename = (BD9A~1.TXT)
long_basename = (テスト.txt)
new_basename = (テスト.txt.copy)
I've tried replacing File::Copy's copy command with a system call:
my $cmd = "copy \"${short_basename}\" \"${new_basename}\"";
print `$cmd`;
and with Win32::CopyFile:
Win32::CopyFile( $short_basename, $new_basename, 'true' );
Unfortunately, I get the same result in both cases (テスト.txt.copy
). For the system call, the print shows 1 file(s) copied.
as expected.
Notes:
This should be possible with the CopyFileW function from Win32API::File, which should be included with Strawberry. I've never messed with Unicode filenames myself, so I'm not sure of the details. You might need to use Encode to manually convert the filename to UTF-16LE (encode('UTF16-LE', $filename)
).
You're getting the long filename using Win32
, which gives you a UTF-8-encoded string.
However, you're then setting the long filename using plain copy
, which uses the C stdlib IO functions. The stdlib functions use the default filesystem encoding.
On modern Linuxes that's usually UTF-8, but on Windows it (sadly) never is, because the system default code page cannot be set to UTF-8. So you'll get your UTF-8 string interpreted as a code page 1252 string on a Western European Windows install, as has happened here. (On a Japanese machine it'd get interpreted as code page 932 — like Shift-JIS — which would come out something like 繝�せ繝�
.)
I've not done this in Perl, but I'd suspect the Win32::CopyFile
function would be more likely to be able to handle the kind of Unicode paths returned elsewhere in the Win32
module.
Use Encode::Locale:
use Encode::Locale;
use Encode;
use File::Copy;
copy( encode(locale_fs => $short_basename),
encode(locale_fs => $new_basename) ) || die $!;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With