I'm puzzled with this test script:
#!perl
use strict;
use warnings;
use encoding 'utf8';
use Test::More 'no_plan';
ok('áá' =~ m/á/, 'ok direct match');
my $re = qr{á};
ok('áá' =~ m/$re/, 'ok qr-based match');
like('áá', $re, 'like qr-based match');
The three tests fail, but I was expecting that the use encoding 'utf8'
would upgrade both the literal áá
and the qr
-based regexps to utf8 strings, and thus passing the tests.
If I remove the use encoding
line the tests pass as expected, but I can't figure it out why would they fail in utf8
mode.
I'm using perl 5.8.8 on Mac OS X (system version).
Do not use the encoding
pragma. It’s broken. (Juerd Waalboer gave a great talk where he mentioned this at YAPC::EU 2k8.)
It does at least two things at once that do not belong together:
And to add injury to insult it also does #1 in a broken fashion: it reinterprets \xNN
sequences as being undecoded octets as opposed to treating them like codepoints, and decodes them, preventing you from being able to express characters outside the encoding you specified and making your source code mean different things depending on the encoding. That’s just astonishingly wrong.
Write your source code in ASCII or UTF-8 only. In the latter case, the utf8
pragma is the correct thing to use. If you don’t want to use UTF-8, but you do want to include non-ASCII charcters, escape or decode them explicitly.
And use I/O layers explicitly or set them using the open
pragma to have I/O automatically transcoded properly.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With