Does assigning to pos
in a string count as a "write", triggering a copy? (Tested with perl 5.26 on OS X)
I'm writing a small lexing utility. One of the things that comes up frequently is searching for a pattern starting at a given offset ... and returning the matched string if there was one.
In order to support repeatedly attempting to consume a token, I need my function to set the pos
to just after the match if we're successful and to the place where we began the search if we are not.
e.g.
my $string = "abc";
consume($string, qr/b/, 1);
printf "%s\n", pos($string); # should print 2
pos($string) = 0; # reset the pos, just to demonstrate
# the intended behavior when there isn't a match
consume($string, qr/z/, 1);
printf "%s\n", pos($string); # should print 1
Here's an implementation that returns the right thing but doesn't set the pos correctly.
package TokenConsume;
use strict;
use warnings;
use Exporter qw[import];
our @EXPORT_OK = qw[consume];
sub consume {
my ($str, $pat, $pos) = @_;
pos($str) = $pos;
my $out = undef;
if ($str =~ $pat) {
$out = substr $str, $-[0], ($+[0] - $-[0]);
pos($str) = $+[0];
} else {
pos($str) = $pos;
}
return $out;
}
Here's an example test from the test suite for the module
do {
my $str = "abc";
pos($str) = 0;
my $res = consume($str, qr/z/, 1);
is($res, undef, "non-first: failed match should capture nothing");
is(pos($str), 1, "non-first: failed match should return pos to beginning of search");
};
it fails with the following message (another test fails too):
# Failed test 'non-first: failed match should return pos to beginning of search'
# at t/test_tokenconsume.t line 38.
# got: '0'
# expected: '1'
# Looks like you failed 2 tests of 7.
I can fix this by passing in a string reference instead and slightly changing the API. Here's the new implementation for completeness.
sub consume {
my ($str_ref, $pat, $pos) = @_;
pos($$str_ref) = $pos;
my $out = undef;
if ($$str_ref =~ $pat) {
$out = substr $$str_ref, $-[0], ($+[0] - $-[0]);
pos($$str_ref) = $+[0];
} else {
pos($$str_ref) = $pos;
}
return $out;
}
So, what's going on here? Why isn't the assignment to pos(...)
propagating back to the original value unless I use a reference?
Perl does assigning to pos trigger a copy?
Perl 5.20 introduced a copy-on-write mechanism which allows scalars to share a string buffer.
No, changing pos($str)
doesn't trigger a copy.
$ perl -MDevel::Peek -e'
$_="abcdef"; Dump($_);
pos($_) = 2; Dump($_);
pos($_) = 3; Dump($_);
$_ .= "g"; Dump($_);
' 2>&1 | grep -P '^(?:SV| FLAGS| PV)'
SV = PV(0x192ee10) at 0x196d4c8
FLAGS = (POK,IsCOW,pPOK)
PV = 0x1955140 "abcdef"\0
SV = PVMG(0x1985810) at 0x196d4c8
FLAGS = (SMG,POK,IsCOW,pPOK)
PV = 0x1955140 "abcdef"\0
SV = PVMG(0x1985810) at 0x196d4c8
FLAGS = (SMG,POK,IsCOW,pPOK)
PV = 0x1955140 "abcdef"\0
SV = PVMG(0x1985810) at 0x196d4c8
FLAGS = (SMG,POK,pPOK)
PV = 0x1962360 "abcdefg"\0
[Blank lines added to output for readability.]
As denoted by the IsCOW
flag, $_
shares its string buffer (PV
) with another scalar (the constant). Assigning to pos
doesn't change that. Appending to $_
, on the other hand, causes the string buffer to be copied (0x1955140
⇒ 0x1962360
, and IsCOW
flag is lost).
Why isn't the assignment to
pos(...)
propagating back to the original value unless I use a reference?
Because it would be really bad if changing one variable ($str
) changed some other unrelated variable ($string
)! That they might share a string buffer is an irrelevant implementation detail.
That said, Perl passes by reference, so $_[0]
is an alias for $string
(the argument), so assigning to pos($_[0])
would change both pos($_[0])
and pos($string)
(being the same variable).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With