Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why are references compacted inside Perl lists?

Putting a precompiled regex inside two different hashes referenced in a list:

my @list = ();

my $regex = qr/ABC/;

push @list, { 'one' => $regex };
push @list, { 'two' => $regex };

use Data::Dumper;
print Dumper(\@list);

I'd expect:

$VAR1 = [
      {
        'one' => qr/(?-xism:ABC)/
      },
      {
        'two' => qr/(?-xism:ABC)/
      }
    ];

But instead we get a circular reference:

$VAR1 = [
      {
        'one' => qr/(?-xism:ABC)/
      },
      {
        'two' => $VAR1->[0]{'one'}
      }
    ];

This will happen with indefinitely nested hash references and shallowly copied $regex.

I'm assuming the basic reason is that precompiled regexes are actually references, and references inside the same list structure are compacted as an optimization (\$scalar behaves the same way). I don't entirely see the utility of doing this (presumably a reference to a reference has the same memory footprint), but maybe there's a reason based on the internal representation

Is this the correct behavior? Can I stop it from happening? Aside from probably making GC more difficult, these circular structures create pretty serious headaches. For example, iterating over a list of queries that may sometimes contain the same regular expression will crash the MongoDB driver with a nasty segfault (see https://rt.cpan.org/Public/Bug/Display.html?id=58500)

like image 627
Arkadiy Kukarkin Avatar asked Jun 17 '10 23:06

Arkadiy Kukarkin


People also ask

What is the use of reference in Perl?

Perl Reference is a way to access the same data but with a different variable. A reference in Perl is a scalar data type which holds the location of another variable. Another variable can be scalar, hashes, arrays, function name etc.

What is array reference in Perl?

A Perl reference is a scalar data type that holds the location of another value which could be scalar, arrays, or hashes. Because of its scalar nature, a reference can be used anywhere, a scalar can be used. You can construct lists containing references to other lists, which can contain references to hashes, and so on.

What does @_ mean in Perl?

@ is used for an array. In a subroutine or when you call a function in Perl, you may pass the parameter list. In that case, @_ is can be used to pass the parameter list to the function: sub Average{ # Get total number of arguments passed. $ n = scalar(@_); $sum = 0; foreach $item (@_){ # foreach is like for loop...

What is a hash reference?

A hash ref is an abbreviation to a reference to a hash. References are scalars, that is simple values. It is a scalar value that contains essentially, a pointer to the actual hash itself.


2 Answers

This is the expected behavior.

Your reference isn't really circular; you have two separate items that point to the same thing. Data::Dumper is printing a human-readable, Perl-parsable representation of your data structures in memory, and what it really means is that both $list[0]->{one} and $list[1]->{two} point to the same thing.

Perl uses reference-counting garbage collection, and while it can get into trouble with circular data structures, this data structure presents no particular problem.

like image 176
Commodore Jaeger Avatar answered Oct 15 '22 12:10

Commodore Jaeger


Nothing funny is happening here.

  1. You stored the same reference twice in the same data structure.
  2. Then you asked Data::Dumper to print a representation of that structure.
  3. Data::Dumper wants to roundtrip the data you give it as faithfully as possible, which means that it needs to output Perl code that will generate a data structure that contains the same reference at $list[0]{one} as it does at $list[0]{two}.
  4. It does this by outputting a data structure where one member contains a reference to another member of the same structure.
  5. But it's not actually a circular reference.
like image 35
hobbs Avatar answered Oct 15 '22 13:10

hobbs