I have a perl script that goes through a folder with a couple of thousand files.
When I started writing the script I was unaware of the perl File::Find functions, so in order to list all the files in the structure I used something along the line of:
open (FILES, "$FIND $FOLDER -type f |");
while (my $line = <FILES>) {...}
Now however I figured I would try doing this from perl instead of launching a external program. (No real reason to do this change other than wanting to learn to use File::Find.)
Trying to learn the semantics of File::Find find function I tried a few things on the command line and compared the output to that of find.
Oddly enough there is 1 file that the program find finds but the perl function skips.
Find works:
machine:~# find /search/path -type f | grep UNIQ
/search/path/folder/folder/UNIQ/movie_file_015.MOV
/search/path/folder/folder/UNIQ/movie_file_145.MOV
/search/path/folder/folder/UNIQ/Thumbs.db
machine:~# find /search/path -type f | wc -l
6439
Perl fails:
machine:~# perl -e 'use File::Find; find(sub { print $File::Find::name . "\n" if -f }, "/search/path");' | grep UNIQ
/search/path/folder/folder/UNIQ/movie_file_145.MOV
/search/path/folder/folder/UNIQ/Thumbs.db
machine:~# perl -e 'use File::Find; find(sub { print $File::Find::name . "\n" if -f }, "/search/path");' | wc -l
6438
Changing to exclude folders rather than include files works:
machine:~# perl -e 'use File::Find; find(sub { print $File::Find::name . "\n" unless -d }, "/search/path");' | grep UNIQ
/search/path/folder/folder/UNIQ/movie_file_015.MOV
/search/path/folder/folder/UNIQ/movie_file_145.MOV
/search/path/folder/folder/UNIQ/Thumbs.db
Only difference between the files is the size:
machine:~# ls -l /search/path/folder/folder/UNIQ/
total 4213008
-rw-rw-r-- 1 user users 4171336632 May 27 2012 movie_file_015.MOV
-rw-rw-r-- 1 user users 141610616 May 27 2012 movie_file_145.MOV
-rw-rw-r-- 1 user users 20992 May 27 2012 Thumbs.db
Perl on the machine in question is old but not ancient:
machine:~# perl -version
This is perl, v5.8.8 built for sparc-linux
Copyright 1987-2006, Larry Wall
Perl may be copied only under the terms of either the Artistic License or the
GNU General Public License, which may be found in the Perl 5 source kit.
Complete documentation for Perl, including FAQ lists, should be found on
this system using "man perl" or "perldoc perl". If you have access to the
Internet, point your browser at http://www.perl.org/, the Perl Home Page.
Is this a known bug or something?
Or am I hitting some size limit of '-f'? The file is almost 4gb and the largest in the selection.
Or is my test (if -f) poorly chosen?
EDIT [trying to stat files]:
Big file fails
machine:~# perl -e 'use Data::Dumper; print Dumper(stat("/search/path/folder/folder/UNIQ/movie_file_015.MOV"));'
Small file works
machine:~# perl -e 'use Data::Dumper; print Dumper(stat("/search/path/folder/folder/UNIQ/movie_file_145.MOV"));'
$VAR1 = 65024;
$VAR2 = 19989500;
$VAR3 = 33204;
$VAR4 = 1;
$VAR5 = 1004;
$VAR6 = 100;
$VAR7 = 0;
$VAR8 = 141610616;
$VAR9 = 1349281585;
$VAR10 = 1338096718;
$VAR11 = 1352403842;
$VAR12 = 16384;
$VAR13 = 276736;
Binary 'stat' works on both files
machine:~# stat /search/path/folder/folder/UNIQ/movie_file_015.MOV
File: "/search/path/folder/folder/UNIQ/movie_file_015.MOV"
Size: 4171336632 Blocks: 8149216 IO Block: 16384 Regular File
Device: fe00h/65024d Inode: 19989499 Links: 1
Access: (0664/-rw-rw-r--) Uid: ( 1004/user) Gid: ( 100/ users)
Access: 2012-10-03 18:11:05.000000000 +0200
Modify: 2012-05-27 07:23:34.000000000 +0200
Change: 2012-11-08 20:44:02.000000000 +0100
machine:~# stat /search/path/folder/folder/UNIQ/movie_file_145.MOV
File: "/search/path/folder/folder/UNIQ/movie_file_145.MOV"
Size: 141610616 Blocks: 276736 IO Block: 16384 Regular File
Device: fe00h/65024d Inode: 19989500 Links: 1
Access: (0664/-rw-rw-r--) Uid: ( 1004/user) Gid: ( 100/ users)
Access: 2012-10-03 18:26:25.000000000 +0200
Modify: 2012-05-27 07:31:58.000000000 +0200
Change: 2012-11-08 20:44:02.000000000 +0100
Also:
machine:~# perl -e 'stat("/search/path/folder/folder/UNIQ/movie_file_145.MOV"); print $! . "\n";'
Bad file descriptor
machine:~# perl -e 'stat("/search/path/folder/folder/UNIQ/movie_file_015.MOV"); print $! . "\n";'
Value too large for defined data type
EDIT2:
# perl -V | grep "uselargefiles|FILE_OFFSET_BITS"
config_args='-Dccflags=-DDEBIAN -Dcccdlflags=-fPIC -Darchname=sparc-linux -Dprefix=/usr -Dprivlib=/usr/share/perl/5.8 -Darchlib=/usr/lib/perl/5.8 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.8.8 -Dsitearch=/usr/local/lib/perl/5.8.8 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man3 -Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Dstatic_ext=B ByteLoader GDBM_File POSIX re -Dusemymalloc -Uuselargefiles -Uafs -Ud_csh -Uusesfio -Uusenm -Duseshrplib -Dlibperl=libperl.so.5.8.8 -Dd_dosuid -des'
useperlio=define d_sfio=undef uselargefiles=undef usesocks=undef
Problem "solved":
machine:~# perl -e 'stat("/search/path/folder/folder/UNIQ/movie_file_015.MOV"); print $!{EOVERFLOW} . "\n";'
92
machine:~# perl -e 'stat("/search/path/folder/folder/UNIQ/movie_file_145.MOV"); print $!{EOVERFLOW} . "\n";'
0
Works:
# perl -e 'use File::Find; find(sub { print $File::Find::name . "\n" if -f or ( $!{EOVERFLOW} > 0 and not -d) }, "/search/path");' | grep UNIQ
/search/path/folder/folder/UNIQ/movie_file_015.MOV
/search/path/folder/folder/UNIQ/movie_file_145.MOV
/search/path/folder/folder/UNIQ/Thumbs.db
Based on a bit of Googling, it looks like your perl interpreter has not been compiled with large file support, causing stat
(and any file tests that rely on it internally, including -f
) to fail for files larger than 2GB.
To check whether this is the case, run:
perl -V | grep "uselargefiles|FILE_OFFSET_BITS"
If your perl has large file support, the output should show something like uselargefiles=define
and -D_FILE_OFFSET_BITS=64
. If it doesn't, it's likely that you perl does not support large files.
It may be somewhat puzzling why large file support is needed even for just stat
ing files. The underlying problem is that the 32-bit version of the stat(2) system call, rather than returning a bogus size, simply fails with EOVERFLOW
if applied to a file larger than 2GB:
"EOVERFLOW
(stat()) path refers to a file whose size cannot be represented in the type off_t. This can occur when an application compiled on a 32-bit platform without -D_FILE_OFFSET_BITS=64 calls stat() on a file whose size exceeds (1<<31)-1 bits."
Technically, receiving that error should be enough to indicate that the named file does exist (although I guess it could be a truly humongous directory too), but perl is not smart enough to realize that — it just sees that the stat failed, and so returns nothing.
(Edit: As ikegami correctly notes in the comments, -f
returns undef
rather than 0 or 1 if the stat(2) call fails, and sets $!
to the error code that caused the failure. So, if you don't mind assuming that all directory entries with size > 2GB are files, you could do something like -f $_ or (not defined -f _ and $!{EOVERFLOW})
to check for it.)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With