I'm already read thru the next:
but probably missed some BASIC points.
Using the
use open(:utf8);
Affects cpan modules too? E.g. when some CPAN module opens any file, it will be opened with :utf8
? Is this statement TRUE? (or the open pragma is only lexically scoped?) AFAIK - it affects modules too, but in "inconsistent" way.. (probably it is a problem of the modules).
Have the open pragma
effect to opendir
? - what i already tried - no - i still need extra decode
on all filenames coming from readdir
(in addition to NFC). So, IO::Dir is something different - what open pragma doesn't covers?
Affect the open pragma sockets
, pipes too? (e.g. anything what is a sort of IO::Handle ?)
All (or most) CPAN modules knows when doing i/o how they need to do it (utf8 or lattin1 or raw?) (probably not, because a simple autodie
doesn't works with the open
pragma... :()
In many places I can read a similar rule: Remember the canonical rule of Unicode: always encode/decode at the edges of your application. This is nice rule - but the application edge mean: my own source code. CPAN modules are (usually) behind the edge too - not only the "outer world", like system or network...
From my experiance, 3/4 of the content my short scripts (what heavily uses CPAN) contains: top declarations, and dozens of encode/decode/NFC for nearly everything...
E.g.: Even logging utilities, need explicit encoding:
use Log::Any qw($log);
use Log::Any::Adapter ('File', 'file.log');
$log->error( encode('utf-8', "tökös"));
Even, when want add tie
to my code, need replace every $key
$value
with encoded versions.
Is this true, or i missed some really basic point in the all above doccu?
Some CPAN module handling utf8 (inside) like, JSON::XS, YAML::XS, File::Slurp.. (altough never succeeded get correct "things" from YAML::XS, pure YAML and JSON::XS works without any problems...
For some modules exists "hacks" - like DBIx::Class::ForceUTF8
, Template::Stash::ForceUTF8
, HTML::FillInForm::ForceUTF8
- and so, - what doesn't allow write correct application for "both" utf and non-utf world... ;(
Many CPAN modules doesn't calls internally the above 'hacked variants' - (e.g. HTML::FillInForm::ForceUTF8
) but only the simple-one, so it is impossible to use them correctly with utf8... Others, silently fail.. ;(
Plack application doesn't handles utf8 logging messages without the annoying "Wide character...." ;( /modern perl :(/ and can continue ;(
From the above I "deducted" (probably wrongly) - than i MUST know and remember for every CPAN module how it is handling utf8 encoded strings and because nowhere is some "registry" - is is mostly trial/error based.
So the main question is:
While i remembering: Here is no magic bullet, but is here some good way how detect and know "utf8 ready CPAN modules" what doesn't need special encode/decode before using them?
If someone need to know, i'm using the next in my every script:
use 5.014;
use warnings;
use utf8;
use feature qw(unicode_strings);
use charnames qw(:full);
use open(:utf8); #this sometimes is bad, so using only open qw(:std :utf8);
use Encode qw(encode decode);
use Unicode::Normalize qw(NFD NFC);
Hm.. just "discovered" the utf8:all
perl module what replace the readdir
with version doing decode.
Empahsis mine:
The
open
pragma serves as one of the interfaces to declare default "layers" (also known as "disciplines") for all I/O. Any two-argumentopen
,readpipe
(akaqx//
) and similar operators found within the lexical scope of this pragma will use the declared defaults. Even three-argumentopen
s may be affected by this pragma when they don't specify IO layers inMODE
.
So no, it doesn't effect any code in which the pragma isn't present. A handle opened within the scope of such a pragma won't lose its layers if passed to code outside of the scope of the pragma, though.
Tests to see what a module expects:
utf8::downgrade($_);
first.utf8::uprade($_);
first.If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With