Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I properly use environment variables encoded as Windows-1251 in Perl?

I have an environment variable set in Windows as TEST=abc£ which uses Windows-1252 code page. Now, when I run a Perl program test.pl this environment value comes properly.

When I call another Perl code - test2.pl from test1.pl either by system(..) or Win32::Process, the environment comes garbled.

Can someone provide information why this could be and way to resolve it?

The version of perl I am using is 5.8.

If my understanding is right, perl internally uses utf-8, so the initial process - test1.pl received it right from Windows-1252utf-8. When we call another process, should we convert back to Windows-1252 code page?

like image 232
Kartlee Avatar asked Mar 13 '10 09:03

Kartlee


People also ask

How do I set an environment variable in Windows Perl?

Making Perl available via the PATH settings on WindowsRight-click on Computer. Go to “Properties” and select the tab “Advanced System settings”. Choose “Environment Variables” and select Path from the list of system variables. Choose Edit .

How do I read an environment variable in Perl script?

In Perl, the environment variables are available via the special %ENV hash; each key in this hash represents one environment variable. At the start of your program's execution, %ENV holds values it has inherited from its parent process (generally the shell).

How do I reference an environment variable in Windows?

To reference a variable in Windows, use %varname% (with prefix and suffix of '%' ). For example, you can use the echo command to print the value of a variable in the form " echo %varname% ".

What is the correct format for an environment variable name?

Environment variable names used by the utilities in the Shell and Utilities volume of IEEE Std 1003.1-2001 consist solely of uppercase letters, digits, and the '_' (underscore) from the characters defined in Portable Character Set and do not begin with a digit.


1 Answers

This has nothing to do with Perl's internal string encoding, but with the need to properly decode data coming from the outside. I'll provide the test case. This is Strawberry Perl 5.10 on a Western European Windows XP.

test1.pl:

use Devel::Peek;
print Dump $ENV{TEST};
use Encode qw(decode);
my $var = decode 'Windows-1252', $ENV{TEST};
print Dump $var;

system "B:/sperl/perl/bin/perl.exe B:/test2.pl";

test2.pl:

use Devel::Peek;
print Dump $ENV{TEST};
use Encode qw(decode);
my $var = decode 'IBM850', $ENV{TEST};
# using Windows-1252 again is wrong here
print Dump $var;

Execute:

> set TEST=abc£
> B:\sperl\perl\bin\perl.exe B:\test1.pl

Output (shortened):

SV = PVMG(0x982314) at 0x989a24
  FLAGS = (SMG, RMG, POK, pPOK)
  PV = 0x98de0c "abc\243"\0
SV = PV(0x3d6a64) at 0x989b04
  FLAGS = (PADMY, POK, pPOK, UTF8)
  PV = 0x9b5be4 "abc\302\243"\0 [UTF8 "abc\x{a3}"]
SV = PVMG(0x982314) at 0x989a24
  FLAGS = (SMG, RMG, POK, pPOK)
  PV = 0x98de0c "abc\243"\0
SV = PV(0x3d6a4c) at 0x989b04
  FLAGS = (PADMY, POK, pPOK, UTF8)
  PV = 0x9b587c "abc\302\243"\0 [UTF8 "abc\x{a3}"]

You are bitten by the fact that Windows uses a different encoding for the text environment (IBM850) than for the graphical environment (Windows-1252). An expert has to explain the deeper details of that phenomenon.

Edit:

It is possible to heuristically (meaning it will fail to do the right thing sometimes, especially for such short strings) determine encodings. The best general purpose solution is Encode::Detect/Encode::Detect::Detector which is based on Mozilla nsUniversalDetector.

There are some ways to decode external data implicitely such as the open pragma/IO layers and the -C switch, however they deal with file streams and program arguments only. As of now, from the environment must be decoded explicitely. I like that better anyway, explicite shows the maintainance programmer you have thought the topic through.

like image 105
daxim Avatar answered Sep 20 '22 13:09

daxim