Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does this regex with a 2-byte unicode char emit "uninitialized" warning for the lvalue on match?

The following code:

#!/usr/bin/env perl
use utf8;
use strict;
use warnings;
use 5.012; # implicitly turn on feature unicode_strings
my $test = "some string";
$test =~ m/.+\x{2013}/x;

Yields:

Use of uninitialized value $test in pattern match (m//) at test.pl line 9.

This seems to happen with any 2-byte character inside \x{}. The following regexes work fine:

/a+\x{2013}/
/.*\x{2013}/
/.+\x{20}/

Also, the error goes away with use bytes, but using that pragma is discouraged. What's going on here?

like image 633
Arkadiy Kukarkin Avatar asked Sep 10 '12 20:09

Arkadiy Kukarkin


2 Answers

This was a bug, and has now been fixed in blead by commits 7e0d5ad7c9cdb21b681e611b888acd41d34c4d05 and c72077c4fff72b66cdde1621c62fb4fd383ce093. This fix should be available in 5.17.5

like image 63
khw Avatar answered Nov 02 '22 16:11

khw


It is singular that you should ask this question. I looks related to a bug that I just reported yesterday

https://rt.perl.org/rt3/Ticket/Display.html?id=114808

where this code also produces "Use of uninitialized value $_ in split ..." warnings, and causes split to unexpectedly return an empty list:

use warnings;
binmode *STDOUT, ":encoding(UTF-8)";
my $pattern = "\x{abc}\x{def}ghi";
for ( "\x{444}", "norm\x{a0}l", "\x{445}", "ab\x{ccc}de\x{fff}gh" ) {
  print "--------------------\ntext is $_, pattern is /$pattern/\n";

  # expect  split  to return  ($_) , but when $pattern and $_ both
  # have wide chars, it returns  ()
  print 'split output is [', split /$pattern/, $_;

  print "]\n";
}
like image 23
mob Avatar answered Nov 02 '22 14:11

mob