Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why do I have to use a * in front of a Perl bareword filehandle?

Tags:

operators

perl

While trying to do this:

 my $obj = new JavaScript::Minifier;
 $obj->minify(*STDIN, *STDOUT);
// modified above line to
 $obj->minify(*IP_HANDLE,*OP_HANDLE)

The above works if IP_HANDLE and OP_HANDLE are filehandles but still I am not able to figure out what actually the * does when applied to a filehandle or any other datatype.

Thanks,

like image 861
sud03r Avatar asked Feb 11 '10 07:02

sud03r


2 Answers

In the bad old days before perl v5.6, which introduced lexical filehandles — more than a decade ago now — passing file- and directory handles was awkward. The code from your question is written using this old-fashioned style.

The technical name for *STDIN, for example, is a typeglob, explained in the “Typeglobs and Filehandles” section of perldata. You may encounter manipulation of typeglobs for various purposes in legacy code. Note that you may grab typeglobs of global variables only, never lexicals.

Passing handles was a common purpose for dealing directly with typeglobs, but there were other uses as well. See below for details.

  • Passing filehandles to subs
  • Syntactic ambiguity: string or filehandle
  • Aliases via typeglob assignment
  • Localizing handles by localizing typeglobs
  • Peeking under the hood: *foo{THING} syntax
  • Tying it all together: DWIM!

Passing filehandles to subs

The perldata documentation explains:

Typeglobs and Filehandles

Perl uses an internal type called a typeglob to hold an entire symbol table entry. The type prefix of a typeglob is a * because it represents all types. This used to be the preferred way to pass arrays and hashes by reference into a function, but now that we have real references, this is seldom needed.

[...]

Another use for typeglobs is to pass filehandles into a function or to create new filehandles. If you need to use a typeglob to save away a filehandle, do it this way:

$fh = *STDOUT;

or perhaps as a real reference, like this:

$fh = \*STDOUT;

See perlsub for examples of using these as indirect filehandles in functions.

The referenced section of perlsub is below.

Passing Symbol Table Entries (typeglobs)

WARNING: The mechanism described in this section was originally the only way to simulate pass-by-reference in older versions of Perl. While it still works fine in modern versions, the new reference mechanism is generally easier to work with. See below.

Sometimes you don’t want to pass the value of an array to a subroutine but rather the name of it, so that the subroutine can modify the global copy of it rather than working with a local copy. In Perl you can refer to all objects of a particular name by prefixing the name with a star: *foo. This is often known as a “typeglob,” because the star on the front can be thought of as a wildcard match for all the funny prefix characters on variables and subroutines and such.

When evaluated, the typeglob produces a scalar value that represents all the objects of that name, including any filehandle, format, or subroutine. When assigned to, it causes the name mentioned to refer to whatever * value was assigned to it. [...]

Note that a typeglob can be taken on global variables only, not lexicals. Heed the warning above. Prefer to avoid this obscure technique.

Syntactic ambiguity: string or filehandle?

Without the * sigil, a bareword is just a string.

Simple strings sometimes suffice, hower. For example, the print operator allows

$ perl -le 'print { "STDOUT" } "Hiya!"'
Hiya!

$ perl -le '$h="STDOUT"; print $h "Hiya!"'
Hiya!

$ perl -le 'print "STDOUT" +123'
123

These fail with strict 'refs' enabled. The manual explains:

FILEHANDLE may be a scalar variable name, in which case the variable contains the name of or a reference to the filehandle, thus introducing one level of indirection.

In your example, consider the syntactic ambiguity. Without the * sigil, you could mean strings

$ perl -MO=Deparse,-p prog.pl
use JavaScript::Minifier;
(my $obj = 'JavaScript::Minifier'->new);
$obj->minify('IP_HANDLE', 'OP_HANDLE');

or maybe a sub call

$ perl -MO=Deparse,-p prog.pl
use JavaScript::Minifier;
sub OP_HANDLE {
    1;
}
(my $obj = 'JavaScript::Minifier'->new);
$obj->minify('IP_HANDLE', OP_HANDLE());

or, of course, a filehandle. Note in the examples above how the bareword JavaScript::Minifier also compiles as a simple string.

Enable the strict pragma and it all goes out the window anyway:

$ perl -Mstrict prog.pl
Bareword "IP_HANDLE" not allowed while "strict subs" in use at prog.pl line 6.
Bareword "OP_HANDLE" not allowed while "strict subs" in use at prog.pl line 6.

Aliases via typeglob assignment

One trick with typeglobs that’s handy for Stack Overflow posts is

*ARGV = *DATA;

(I could be more precise with *ARGV = *DATA{IO}, but that’s a little fussy.)

This allows the diamond operator <> to read from the DATA filehandle, as in

#! /usr/bin/perl

*ARGV = *DATA;   # for demo only; remove in production

while (<>) { print }

__DATA__
Hello
there

This way, the program and its input can be in a single file, and the code is a closer match to how it will look in production: just delete the typeglob assignment.

Localizing handles by localizing typeglobs

As noted in perlsub

Temporary Values via local()

WARNING: In general, you should be using my instead of local, because it’s faster and safer. Exceptions to this include the global punctuation variables, global filehandles and formats, and direct manipulation of the Perl symbol table itself. local is mostly used when the current value of a variable must be visible to called subroutines. [...]

you can use typeglobs to localize filehandles:

$ cat prog.pl
#! /usr/bin/perl

sub foo {
  local(*STDOUT);
  open STDOUT, ">", "/dev/null" or die "$0: open: $!";
  print "You can't see me!\n";
}

print "Hello\n";
foo;
print "Good bye.\n";

$ ./prog.pl
Hello
Good bye.

“When to Still Use local()” in perlsub has another example.

2. You need to create a local file or directory handle or a local function.

A function that needs a filehandle of its own must use local() on a complete typeglob. This can be used to create new symbol table entries:

sub ioqueue {
    local (*READER, *WRITER); # not my!
    pipe (READER, WRITER) or die "pipe: $!";
    return (*READER, *WRITER);
}
($head, $tail) = ioqueue();

To emphasize, this style is old-fashioned. Prefer to avoid global filehandles in new code, but being able to understand the technique in existing code is useful.

Peeking under the hood: *foo{THING} syntax

You can get at the different parts of a typeglob, as perlref explains:

A reference can be created by using a special syntax, lovingly known as the *foo{THING} syntax. *foo{THING} returns a reference to the THING slot in *foo (which is the symbol table entry which holds everything known as foo).

$scalarref = *foo{SCALAR};
$arrayref = *ARGV{ARRAY};
$hashref = *ENV{HASH};
$coderef = *handler{CODE};
$ioref = *STDIN{IO};
$globref = *foo{GLOB};
$formatref = *foo{FORMAT};

All of these are self-explanatory except for *foo{IO}. It returns the IO handle, used for file handles (open), sockets (socket and socketpair), and directory handles (opendir). For compatibility with previous versions of Perl, *foo{FILEHANDLE} is a synonym for *foo{IO}, though it is deprecated as of 5.8.0. If deprecation warnings are in effect, it will warn of its use.

*foo{THING} returns undef if that particular THING hasn’t been used yet, except in the case of scalars. *foo{SCALAR} returns a reference to an anonymous scalar if $foo hasn’t been used yet. This might change in a future release.

*foo{IO} is an alternative to the *HANDLE mechanism given in [“Typeglobs and Filehandles” in perldata] for passing filehandles into or out of subroutines, or storing into larger data structures. Its disadvantage is that it won’t create a new filehandle for you. Its advantage is that you have less risk of clobbering more than you want to with a typeglob assignment. (It still conflates file and directory handles, though.) However, if you assign the incoming value to a scalar instead of a typeglob as we do in the examples below, there’s no risk of that happening.

splutter(*STDOUT); # pass the whole glob
splutter(*STDOUT{IO}); # pass both file and dir handles

sub splutter {
  my $fh = shift;
  print $fh "her um well a hmmm\n";
}

$rec = get_rec(*STDIN); # pass the whole glob
$rec = get_rec(*STDIN{IO}); # pass both file and dir handles

sub get_rec {
  my $fh = shift;
  return scalar <$fh>;
}

Tying it all together: DWIM!

Context is key with Perl. In your example, although the syntax may be ambiguous, the intent is not: even if the parameters are strings, those strings are clearly intended to name filehandles.

So consider all the cases minify may need to handle:

  • bareword
  • bare typeglob
  • reference to typeglob
  • filehandle in a scalar

For example:

#! /usr/bin/perl

use warnings;
use strict;

*IP_HANDLE = *DATA;
open OP_HANDLE, ">&STDOUT";
open my $fh, ">&STDOUT";
my $offset = tell DATA;

use JavaScript::Minifier;
my $obj = JavaScript::Minifier->new;
$obj->minify(*IP_HANDLE, "OP_HANDLE");

seek DATA, $offset, 0 or die "$0: seek: $!";
$obj->minify(\*IP_HANDLE, $fh);

__DATA__
Ahoy there
matey!

As a library author, being accomodative can be useful. To illustrate, the following stub of JavaScript::Minifier understands both old-fashioned and modern ways of passing filehandles.

package JavaScript::Minifier;

use warnings;
use strict;

sub new { bless {} => shift }

sub minify {
  my($self,$in,$out) = @_;

  for ($in, $out) {
    no strict 'refs';
    next if ref($_) || ref(\$_) eq "GLOB";

    my $pkg = caller;
    $_ = *{ $pkg . "::" . $_ }{IO};
  }

  while (<$in>) { print $out $_ }
}

1;

Output:

$ ./prog.pl
Name "main::OP_HANDLE" used only once: possible typo at ./prog.pl line 7.
Ahoy there
matey!
Ahoy there
matey!
like image 131
Greg Bacon Avatar answered Nov 13 '22 12:11

Greg Bacon


The * refers to a Perl "typeglob", which is an obscure implementation detail of Perl. Some older Perl code needs to refer to file handles using typeglobs (since there wasn't any other way to do it at the time). More modern code can use filehandle references instead, which are easier to work with.

The * is analogous to $ or %, it refers to a different kind of object known by the same name.

From the perldata documentation page:

Perl uses an internal type called a typeglob to hold an entire symbol table entry. The type prefix of a typeglob is a * , because it represents all types. This used to be the preferred way to pass arrays and hashes by reference into a function, but now that we have real references, this is seldom needed.

like image 36
Greg Hewgill Avatar answered Nov 13 '22 12:11

Greg Hewgill