I have recently started learning Perl and one of my latest assignments involves searching a bunch of files for a particular string. The user provides the directory name as an argument and the program searches all the files in that directory for the pattern. Using <code>readdir()</code> I have managed to build an array with all the searchable file names and now need to search each and every file for the pattern, my implementation looks something like this - <pre class="prettyprint"><code>sub searchDir($) { my $dirN = shift; my @dirList = glob("$dirN/*"); for(@dirList) { push @fileList, $_ if -f $_; } @ARGV = @fileList; while(<>) { ## Search for pattern } } </code></pre> My question is - is it alright to manually load the @ARGV array as has been done above and use the <> operator to scan in individual lines or should I open / scan / close each file individually? Will it make any difference if this processing exists in a subroutine and not in the main function?

I would prefer this more explicit and readable version: <pre class="prettyprint"><code>#!/usr/bin/perl -w foreach my $file (<$ARGV[0]/*>){ open(F, $file) or die "$!: $file"; while(<F>){ # search for pattern } close F; } </code></pre> But it is also okay to manipulate <code>@ARGV</code>: <pre class="prettyprint"><code>#!/usr/bin/perl -w @ARGV = <$ARGV[0]/*>; while(<>){ # search for pattern } </code></pre>

Should I manually set Perl's @ARGV so I can use <> to open, scan, and close files?

Q: How many arguments does open take?

Most often, open gets invoked with three arguments: the required FILEHANDLE (usually an empty scalar variable), followed by MODE (usually a literal describing the I/O mode the filehandle will use), and then the filename that the new filehandle will refer to. or writing to one: open(my $fh, ">", "output.

Tags:

file

input

perl

I have recently started learning Perl and one of my latest assignments involves searching a bunch of files for a particular string. The user provides the directory name as an argument and the program searches all the files in that directory for the pattern. Using readdir() I have managed to build an array with all the searchable file names and now need to search each and every file for the pattern, my implementation looks something like this -

sub searchDir($) {
    my $dirN = shift;
    my @dirList = glob("$dirN/*");
    for(@dirList) {
        push @fileList, $_ if -f $_;

    }
    @ARGV = @fileList;
    while(<>) {
        ## Search for pattern
    }
}

My question is - is it alright to manually load the @ARGV array as has been done above and use the <> operator to scan in individual lines or should I open / scan / close each file individually? Will it make any difference if this processing exists in a subroutine and not in the main function?

642

asked Feb 03 '09 04:02

aks

2 Answers

On the topic of manipulating @ARGV - that's definitely working code, Perl certainly allows you to do that. I don't think it's a good coding habit though. Most of the code I've seen that uses the "while (<>)" idiom is using it to read from standard input, and that's what I initially expect your code to do. A more readable pattern might be to open/close each input file individually:

foreach my $file (@files) {
    open FILE, "<$file" or die "Error opening file $file ($!)";
    my @lines = <FILE>;
    close FILE or die $!;

    foreach my $line (@file) {
        if ( $line =~ /$pattern/ ) {
            # do something here!
        }
    }
}

That would read more easily to me, although it is a few more lines of code. Perl allows you a lot of flexibility, but I think that makes it that much more important to develop your own style in Perl that's readable and understandable to you (and your co-workers, if that's important for your code/career).

Putting subroutines in the main function or in a subroutine is also mostly a stylistic decision that you should play around with and think about. Modern computers are so fast at this stuff that style and readability is much more important for scripts like this, as you're not likely to encounter situations in which such a script over-taxes your hardware.

Good luck! Perl is fun. :)

Edit: It's of course true that if he had a very large file, he should do something smarter than slurping the entire file into an array. In that case, something like this would definitely be better:

while ( my $line = <FILE> ) {
    if ( $line =~ /$pattern/ ) {
        # do something here!
    }
}

The point when I wrote "you're not likely to encounter situations in which such a script over-taxes your hardware" was meant to cover that, sorry for not being more specific. Besides, who even has 4GB hard drives, let alone 4GB files? :P

Another Edit: After perusing the Internet on the advice of commenters, I've realized that there are hard drives that are much larger than 4GB available for purchase. I thank the commenters for pointing this out, and promise in the future to never-ever-ever try to write a sarcastic comment on the internet.

109

answered Sep 28 '22 04:09

James Thompson

I would prefer this more explicit and readable version:

#!/usr/bin/perl -w 

foreach my $file (<$ARGV[0]/*>){
    open(F, $file) or die "$!: $file";
    while(<F>){
      # search for pattern
    }
    close F;
}

But it is also okay to manipulate @ARGV:

#!/usr/bin/perl -w 

@ARGV = <$ARGV[0]/*>;
while(<>){
    # search for pattern
}

answered Sep 28 '22 04:09

Frank

Related questions
                            
                                perl regex to match any `word character' except Q
                            
                                Module for converting number to month name in perl
                            
                                Why is $#ARGV 0?
                            
                                Perl fails stating `Variable length lookbehind not implemented`
                            
                                Can i setuid for perl script?
                            
                                Match the nth longest possible string in Perl
                            
                                What are extended regular expressions?
                            
                                What version of XPath is implemented in XML::LibXML?
                            
                                Perl function/sub best practice
                            
                                Does Perl Best Practice allow having a method name in a string?
                            
                                What does the dollar character in the brackets of a Perl subroutine mean?
                            
                                Why do 4 different languages give 4 different results here?
                            
                                Installing perl modules in [Windows Subsystem for Linux]
                            
                                Using __DATA__ in a program
                            
                                Create an Array Ref in one line in Perl
                            
                                How can set a hard maximum recursion depth in Perl?
                            
                                Extracting and storing the the values in key value pair from a text in file in perl
                            
                                Deploying Perl to a share nothing cluster
                            
                                How do I skip to a specific input line in Perl?
                            
                                How should I handle errors in Perl methods, and what should I return from the methods?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With