Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Perl does assigning to pos trigger a copy?

Tags:

perl

Does assigning to pos in a string count as a "write", triggering a copy? (Tested with perl 5.26 on OS X)

I'm writing a small lexing utility. One of the things that comes up frequently is searching for a pattern starting at a given offset ... and returning the matched string if there was one.

In order to support repeatedly attempting to consume a token, I need my function to set the pos to just after the match if we're successful and to the place where we began the search if we are not.

e.g.

my $string = "abc";
consume($string, qr/b/, 1);
printf "%s\n", pos($string); # should print 2

pos($string) = 0; # reset the pos, just to demonstrate
                  # the intended behavior when there isn't a match

consume($string, qr/z/, 1);
printf "%s\n", pos($string); # should print 1

Here's an implementation that returns the right thing but doesn't set the pos correctly.

package TokenConsume;
use strict;
use warnings;

use Exporter qw[import];
our @EXPORT_OK = qw[consume];

sub consume {
    my ($str, $pat, $pos) = @_;
    pos($str) = $pos;
    my $out = undef;
    if ($str =~ $pat) {
        $out = substr $str, $-[0], ($+[0] - $-[0]);
        pos($str) = $+[0];
    } else {
        pos($str) = $pos;
    }
    return $out;
}

Here's an example test from the test suite for the module

do {
    my $str = "abc";
    pos($str) = 0;
    my $res = consume($str, qr/z/, 1);
    is($res, undef, "non-first: failed match should capture nothing");
    is(pos($str), 1, "non-first: failed match should return pos to beginning of search");
};

it fails with the following message (another test fails too):

#   Failed test 'non-first: failed match should return pos to beginning of search'
#   at t/test_tokenconsume.t line 38.
#          got: '0'
#     expected: '1'
# Looks like you failed 2 tests of 7.

I can fix this by passing in a string reference instead and slightly changing the API. Here's the new implementation for completeness.

sub consume {
    my ($str_ref, $pat, $pos) = @_;
    pos($$str_ref) = $pos;
    my $out = undef;
    if ($$str_ref =~ $pat) {
        $out = substr $$str_ref, $-[0], ($+[0] - $-[0]);
        pos($$str_ref) = $+[0];
    } else {
        pos($$str_ref) = $pos;
    }
    return $out;
}

So, what's going on here? Why isn't the assignment to pos(...) propagating back to the original value unless I use a reference?

like image 984
Gregory Nisbet Avatar asked Sep 26 '17 04:09

Gregory Nisbet


1 Answers

Perl does assigning to pos trigger a copy?

Perl 5.20 introduced a copy-on-write mechanism which allows scalars to share a string buffer.

No, changing pos($str) doesn't trigger a copy.

$ perl -MDevel::Peek -e'
    $_="abcdef"; Dump($_);
    pos($_) = 2; Dump($_);
    pos($_) = 3; Dump($_);
    $_ .= "g";   Dump($_);
' 2>&1 | grep -P '^(?:SV|  FLAGS|  PV)'

SV = PV(0x192ee10) at 0x196d4c8
  FLAGS = (POK,IsCOW,pPOK)
  PV = 0x1955140 "abcdef"\0

SV = PVMG(0x1985810) at 0x196d4c8
  FLAGS = (SMG,POK,IsCOW,pPOK)
  PV = 0x1955140 "abcdef"\0

SV = PVMG(0x1985810) at 0x196d4c8
  FLAGS = (SMG,POK,IsCOW,pPOK)
  PV = 0x1955140 "abcdef"\0

SV = PVMG(0x1985810) at 0x196d4c8
  FLAGS = (SMG,POK,pPOK)
  PV = 0x1962360 "abcdefg"\0

[Blank lines added to output for readability.]

As denoted by the IsCOW flag, $_ shares its string buffer (PV) with another scalar (the constant). Assigning to pos doesn't change that. Appending to $_, on the other hand, causes the string buffer to be copied (0x19551400x1962360, and IsCOW flag is lost).


Why isn't the assignment to pos(...) propagating back to the original value unless I use a reference?

Because it would be really bad if changing one variable ($str) changed some other unrelated variable ($string)! That they might share a string buffer is an irrelevant implementation detail.

That said, Perl passes by reference, so $_[0] is an alias for $string (the argument), so assigning to pos($_[0]) would change both pos($_[0]) and pos($string) (being the same variable).

like image 94
ikegami Avatar answered Oct 29 '22 20:10

ikegami