Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to improve performance of File::Find::Rule calls?

I am using File::Find::Rule to locate one-level-deep user-executable folders in a directory specified in $dir:

my @subDirs = File::Find::Rule->permissions(isExecutable => 1, user => "$uid")->
                                extras({ follow => 1, follow_skip => 2 })->
                                directory->
                                maxdepth(1)->
                                in( $dir );

Here is the rough equivalent, using the UNIX find utility:

my $subDirStr = `find $dir -maxdepth 1 -type d -user $username -perm -100`;
chomp($subDirStr); 
my @subDirs = split("\n", $subDirStr);

Both are run in scripts that have permissions to recover this data.

If I run a find statement on the command-line, the results come back instantaneously.

If I run either of the above statements via a Perl script, the results take several seconds to operate.

What can I do programmatically to improve the performance of either of the two Perl approaches?

like image 717
Alex Reynolds Avatar asked Feb 19 '11 17:02

Alex Reynolds


2 Answers

I suspect that the delay you are seeing is due to the length of time it takes to produce all the results. Sure, if you pipe your find command into less, you get results immediately, but if you pipe it into tail you might see a delay similar to what you see with your Perl script.

In both your alternative implementations, you are creating an array with a list of all matching files - your code will not continue on until the file matching process is complete.

You could alternatively use an iterator approach like this:

my $rule = File::Find::Rule->permissions(isExecutable => 1, user => $uid)
                           ->extras({ follow => 1, follow_skip => 2 })
                           ->directory
                           ->maxdepth(1)
                           ->start($dir);
while( defined ( my $path = $rule->match ) ) {
    ...
}

For completeness, you could achieve a similar result with the find command. Instead of using backticks, you could explicitly use a pipe and read results one at a time:

open my $pipe, 'find $dir -maxdepth 1 -type d -user $username -perm -100|' or die "Can't run find: $!";
while(my $path = <$pipe>) {
    ...
}

Note that with both these examples, your code can start processing results as soon as the first match is found. However, the total time taken until the last result is processed shouldn't be much different to your original code.

like image 59
Grant McLean Avatar answered Nov 17 '22 04:11

Grant McLean


I'm going to ignore the File::Find::Rule part for the moment and focus on the difference in find from the command line vs. find from backticks in perl.

First, please verify that a script that does nothing but the find... command still has the problem, run by you as the same user and from and on the same directories as the quickly-running command line invocation.

If it doesn't have the problem, we need to know more about your script. Or you need to remove things from your script piece by piece until you have it down to just doing the find command, and see what needed to be removed to make the problem go away.

If it does, try using a full path (e.g. /usr/bin/find) instead of just find to eliminate the possibility of PATH differences or shell aliases causing a difference.

Also check that the output of the command line run and backticks run are identical.

And try redirecting the output of both to /dev/null (inside the backticks, for the perl version) and see if that makes any difference to the timing.

like image 44
ysth Avatar answered Nov 17 '22 04:11

ysth