Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Failed to check if file with German name is exist in the file system

Background:

I have 2 machines: one is running German windows 7 and my PC running English(with Hebrew locale) windows 7.
In my Perl code I'm trying to check if the file that I got from the German machine exists on my machine.
The file name is ßßßzllpoöäüljiznppü.txt

Why is it failed when I do the following code:

use Encode;
use Encode::locale;

sub UTF8ToLocale
{
  my $str = decode("utf8",$_[0]);
  return encode(locale, $str);
}

if(!-e UTF8ToLocale($read_file))
{
   print "failed to open the file";
}
else
{
   print $read_file;
}

Same thing goes also when I'm trying to open the file:

open (wtFile, ">", UTF8ToLocale($read_file));  
binmode wtFile;
shift @_;
print wtFile @_;
close wtFile;

The file name is converted from German to utf8 in my java application and this is passed to the perl script. The perl script takes this file name and convert it from utf8 to the system locale, see UTF8ToLocale($read_file) function call, and I believe that is the problem.

Questions:
Can you please tell me what is the OS file system charset encoding?
When I create German file name in OS that the locale is Hebrew in which Charset is it saved?
How do I solve this problem?

Update:

Here is another code that I run with hard coded file name on my PC, the script file is utf8 encoded:

use Encode;
use Encode::locale;

my $string = encode("utf-16",decode("utf8","C:\\TestPerl\\ßßßzllpoöäüljiznppü.txt"));

if (-e $string)
{
  print "exists\r\n";
}
else
{
  print "not exists\r\n"
}

The output is "not exists". I also tried different charsets: cp1252, cp850, utf-16le, nothing works. If I'm changing the file name to English or Hebrew(my default locale) it works. Any ideas?

like image 388
Snow Avatar asked Jan 01 '26 21:01

Snow


1 Answers

Windows 7 uses UTF-16 internally [citation needed] (I don't remember the byte order). You don't need to convert file names because of that. However, if you transport the file via a FAT file system (eg an old USB stick) or other non Unicode aware file systems these benefits will get lost.

The locale setting you are talking about only affects the language of the user interface and the apparent folder names (Programme (x86) vs. Program Files (x86) with the latter being the real name in the file system).

The larger problem I can see is the internal encoding of the file contents that you want to transfer as some applications may default to different encodings depending on the locale. There is no solution to that except being explicit when the file is created. Sticking to UTF-8 is generally a good idea.

And why do you convert the file names with another tool? Any Unicode encoding should be sufficient for transfer.


Your script does not work because you reference an undefined global variable called $read_file. Assuming your second code block is not enclosed in any scope, especially not in a sub, then the @_ variable is not available. To get command line arguments you should consider using the @ARGV array. The logic ouf your script isn't clear anyway: You print error messages to STDOUT, not STDERR, you "decode" the file name and then print out the non-decoded string in your else-branch, you are paranoid about encodings (which is generally good) but you don't specify an encoding for your output stream etc.

like image 196
amon Avatar answered Jan 03 '26 10:01

amon



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!