Consider the following input data in file y.txt (encoded in UTF-8).
bar
föbar
and a file y.pl, which puts the two input lines into an array and processes them, looking for substring start positions.
use open qw(:std :utf8);
my @array;
while (<>) {
  push @array, $_;
  print $-[0] . "\n" if /bar/;
}
# $array[0] = "bar", $array[1] = "föbar"
print $-[0] . "\n" if $array[1] =~ /$array[0]/u;
If I call perl y.pl < y.txt,  I get
0
2
3
as the output.  However, I would expect that the last number is 2 also, but for some reason the second /.../ regexp behaves differently.  What am I missing?  I guess it's an encoding issue, but whatever I tried, I didn't succeed.  This is Perl 5.18.2.
It appears to be a bug in 5.18.
$ 5.18.2t/bin/perl a.pl a
0
2
3
$ 5.20.1t/bin/perl a.pl a
0
2
2
I can't find a workaround. Adding utf8::downgrade($array[0]); or utf8::downgrade($array[0], 1); works in the case you presented, but not using the following data or any other where the interpolated pattern contains characters >255.
♠bar
f♠♠bar
It appears that this can only be fixed by upgrading your Perl, which is actually quite simple. (Just make sure to install it to a different directory than your system perl by following the instructions in INSTALL!)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With