Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Perl's main package - block syntax - pragmas and BEGIN/END blocks

Tags:

perl

I saw this question: Is there any difference between "standard" and block package declaration? and thinking about the main package. When I write a script, like:

---- begin of the file ---
#!/usr/bin/perl  #probably removed by shell?
my $var; #defined from now up to the end of file
...
---- end of the file ----

this automatically comes into the main package, so as I understand right the next happens.

---- begin of the file ---
{ #<-- 1st line
  package main;
  my $var; #variable transformed to block scope - "up to the end of block"
  ...
} # <-- last line
---- end of the file ----

which is equivalent to

---- begin of the file ---
package main { #1st line
  my $var; #variable block scope
  ...
} #last line
---- end of the file ----

Question 1: Is the above right? That happens with the main package?

Now the BEGIN/END blocks and pragmas. There are handled in the compilation phase, if I understand right. So having:

---- begin of the file ---
#!/usr/bin/perl
use strict;    #file scope
use warnings;  #file scope
my $var; #defined from now up to the end of file
BEGIN {
    say $var; #the $var is not known here - but it is declared
}
...
---- end of the file ----

the $var is declared, but here

---- begin of the file ---
#!/usr/bin/perl
use strict;    #file scope
use warnings;  #file scope

BEGIN {
    say $var; #the $var is not known here - but "requires explicit package name" error
}

my $var; #defined from now up to the end of file
...
---- end of the file ----

the $var is not declared.

So how is the above translated to "default main package"?

It is always:

---- begin of the file ---
{
  package main;
  use strict;    #block scope ???
  use warnings;  #block scope ???
  my $var; #defined from now up to the end of block

  BEGIN { #NESTED???
    say $var; #the $var is not known here - but declared
  }
  ...
}
---- end of the file ----

which is equivalent of

---- begin of the file ---
package main {
  use strict;    #block scope
  use warnings;  #block scope
  my $var; #defined from now up to the end of block

  BEGIN {  #NESTED block
    say $var;
  }
  ...
}
---- end of the file ----

The question is - is here _ANY benefit using something like:

  ---- begin of the file ---
  use strict;   #always should be at the START OF THE FILE - NOT IN BLOCKS?
  use warnings;

  #not NESTED
  BEGIN {
  }

  package main {
     my $var;
  }

So the question is:

  • how exactly are handled the pragmas, BEGIN/END/CHECK blocks and the main package in a context of BLOCK syntax?
  • when changes the "file scope" to the "block scope" - or if it not changes, what is the equivalent translation of "standard main package" to "main package {block}"

and the last code:

  ---- begin of the file ---
  use strict;   #always should be at the START OF THE FILE - NOT IN BLOCKS?
  use warnings;
  my $var;

  #not NESTED
  BEGIN {
  }

  package main {

  }

How does the my $var get into the main package? So this is translated to somewhat as:

  ---- begin of the file ---
  use strict;   #always should be at the START OF THE FILE - NOT IN BLOCKS?
  use warnings;

  #not NESTED
  BEGIN {
  }

  package main {
      my $var; #### GETS HERE????
  }

Sorry for the wall of text...

like image 787
cajwine Avatar asked Aug 01 '13 08:08

cajwine


2 Answers

When you declare the variable with my, it is not in any package. At all. The block scope is strictly distinct from any package. The variable is valid until the closing brace (}) of the innermost enclosing block only without package qualification. If you wrote $main::var or $::var, it would be different variable.

use warnings;
use strict;
package main {
    my $var = 'this';
}
$var; # error, $var was not declared in this scope
say $main::var; # says nothing

There are two more ways to declare variables:

  • use vars qw($var) makes $var refer to the variable in current package wherever inside the package.
  • our $var makes $var refer to the variable in package that was current at the time of the our statement within current block.

The block package declaration is a block and puts it's content in a package. Whereas the block-less package declaration puts following content in another package, but the current block scope continues.

The other missing bit is that when you write

 use warnings;
 use strict;
 package main {
 # ...
 }

you've effectively written

 package main {
     use warnings;
     use strict;
     package main {
     # ...
     }
 }

and since the package is the same, that's the same as

 package main {
     use warnings;
     use strict;
     {
     # ...
     }
 }

In other words the package is main at the beginning of the file and an implicit block scope (the file scope) is open. When you re-enter main package, it has no effect and if it is associated with block, it behaves as any block.

like image 111
Jan Hudec Avatar answered Oct 22 '22 04:10

Jan Hudec


Scope and execution order have little to do with each other.

Yes, the default package is main. So it could be said that

---- begin file ----
1: #!/usr/bin/perl
2: my $var;
3: ...;
---- end file ----

is equivalent to

package main {
---- begin file ----
1: #!/usr/bin/perl
2: my $var;
3: ...;
---- end file ----
}

It is simply that the main package is assumed unless another is specified. This does not change line numbers etc.

When a variable declaration is encountered, it is immediately added to the list of known variables. Or more precisely, as soon as the statement where it was declared has ended:

my      # $var unknown
$var    # $var unknown
=       # $var unknown
foo()   # $var unknown
;       # NOW $var is declared

Similar for pragmas: An use statement is executed as soon at is fully parsed. In the next statement, all imports are available.

Blocks like BEGIN are executed outside of the normal control flow, but obey scoping rules.

BEGIN blocks are executed as soon as they are fully parsed. The return value is discarded.

END blocks are executed when the interpreter exits by normal means.

When we have

my $var = 1; # $var is now declared, but the assignment is run-time
BEGIN {
 # here $var is declared, but was not assigned yet.
 $var = 42; # but we can assign something if we like
}
# This is executed run-time: $var == 1
say $var;
BEGIN {
  # This is executed immediately. The runtime assignment has not yet happened.
  # The previous asignment in BEGIN did happen.
  say $var;
}

The result?

42
1

Note that if I do not assign a new value at runtime, this variable keeps its compile time value:

my $var;
...; # rest as before

Then we get

42
42

Blocks can be arbitrarily nested:

my $var;
if (0) {
  BEGIN {
    say "BEGIN 1: ", ++$var;
    BEGIN {
      say "BEGIN 2: ", ++$var;
      BEGIN { $var = 0 }
    }
  }
}

Output:

BEGIN 2: 1
BEGIN 1: 2

Here we can see that BEGIN blocks are executed before the if (0) is optimized away, because BEGIN is executed immediately.

We can also ask which package a block is in:

BEGIN { say "BEGIN: ", __PACKAGE__ }
say "before package main: ", __PACKAGE__;

# useless redeclaration, we are already in main
package main {
  say "in package main: ", __PACKAGE__;
}

Output:

BEGIN: main
before package main: main
in package main: main

So we are in main before we redeclared it. A package is no sealed, immutable entity. It is rather a namespace we can reenter at will:

package Foo;
say "We are staring in ", __PACKAGE__;
for (1 .. 6) {
  package Bar;
  say "Loop $_ in ", __PACKAGE__;
  if ($_ % 2) {
    package Baz;
    say "... and in ", __PACKAGE__;
    BEGIN { say "just compiled something in ", __PACKAGE__ }
  } else {
    package Foo;
    say "... again in ", __PACKAGE__;
    BEGIN { say "just compiled something in ", __PACKAGE__ }
  }
}

Output:

just compiled something in Baz
just compiled something in Foo
We are staring in Foo
Loop 1 in Bar
... and in Baz
Loop 2 in Bar
... again in Foo
Loop 3 in Bar
... and in Baz
Loop 4 in Bar
... again in Foo
Loop 5 in Bar
... and in Baz
Loop 6 in Bar
... again in Foo

So regarding this:

The question is - is here ANY benefit using something like:

---- begin of the file ---
use strict;
use warnings;

package main {
  my $var;
}

The answer is no: If we are already in the package main, redeclaring it has no benefit:

say __PACKAGE__;
package main {
  my $var;
  say __PACKAGE__;
}
say __PACKAGE__;

If we execute that we can see we are in main the whole time.

Pragmas like strict and warnings have lexical scope, so declaring them as early as possible is good.

# no strict yet
use strict;
# strict now activated

BEGIN {
  # we are still in scope of strict
  $var = 1; # ooh, an undeclared variable. Will it blow up?
  say "BEGIN was executed";
}

my $var;

Output:

Global symbol "$var" requires explicit package name at - line 8.
BEGIN not safe after errors--compilation aborted at - line 10.

The variable was not declared inside the BEGIN block, because it was compiled and (not quite executed) before the declaration. Therefore, strict issues this error. Because this error occurred during compilation of the BEGIN block, this block wasn't executed.

Because of scoping, you can't always reorder your source code in a way that avoids using BEGIN blocks. Here is something you should never do:

for (1 .. 3) {
  my $var;
  BEGIN { $var = 42 };
  say $var // "undef";
}

Output:

42
undef
undef

because $var is emptied whenever the block is left. (This is probably undefined behaviour, and may possibly change. This runs under at least v5.16.3 and v5.14.2).

When your program is compiled no reordering takes place. Instead, BEGIN blocks are executed as soon as they are compiled.

For the exact times when CHECK and END are run, read through perlmod.

like image 22
amon Avatar answered Oct 22 '22 04:10

amon