Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

perl -f check fails to identify file

Tags:

file

perl

I have a perl script that goes through a folder with a couple of thousand files.

When I started writing the script I was unaware of the perl File::Find functions, so in order to list all the files in the structure I used something along the line of:

open (FILES, "$FIND $FOLDER -type f |");
while (my $line = <FILES>) {...}

Now however I figured I would try doing this from perl instead of launching a external program. (No real reason to do this change other than wanting to learn to use File::Find.)

Trying to learn the semantics of File::Find find function I tried a few things on the command line and compared the output to that of find.

Oddly enough there is 1 file that the program find finds but the perl function skips.

Find works:

machine:~# find /search/path -type f | grep UNIQ
/search/path/folder/folder/UNIQ/movie_file_015.MOV
/search/path/folder/folder/UNIQ/movie_file_145.MOV
/search/path/folder/folder/UNIQ/Thumbs.db

machine:~# find /search/path -type f | wc -l
    6439

Perl fails:

machine:~# perl -e 'use File::Find; find(sub { print $File::Find::name . "\n" if -f }, "/search/path");' | grep  UNIQ
/search/path/folder/folder/UNIQ/movie_file_145.MOV
/search/path/folder/folder/UNIQ/Thumbs.db

machine:~# perl -e 'use File::Find; find(sub { print $File::Find::name . "\n" if -f }, "/search/path");' | wc -l
    6438

Changing to exclude folders rather than include files works:

machine:~# perl -e 'use File::Find; find(sub { print $File::Find::name . "\n" unless -d }, "/search/path");' | grep  UNIQ
/search/path/folder/folder/UNIQ/movie_file_015.MOV
/search/path/folder/folder/UNIQ/movie_file_145.MOV
/search/path/folder/folder/UNIQ/Thumbs.db

Only difference between the files is the size:

machine:~# ls -l /search/path/folder/folder/UNIQ/
total 4213008
-rw-rw-r--    1 user users    4171336632 May 27  2012 movie_file_015.MOV
-rw-rw-r--    1 user users    141610616 May 27  2012 movie_file_145.MOV
-rw-rw-r--    1 user users       20992 May 27  2012 Thumbs.db

Perl on the machine in question is old but not ancient:

machine:~# perl -version

This is perl, v5.8.8 built for sparc-linux

Copyright 1987-2006, Larry Wall

Perl may be copied only under the terms of either the Artistic License or the
GNU General Public License, which may be found in the Perl 5 source kit.

Complete documentation for Perl, including FAQ lists, should be found on
this system using "man perl" or "perldoc perl".  If you have access to the
Internet, point your browser at http://www.perl.org/, the Perl Home Page.

Is this a known bug or something?

Or am I hitting some size limit of '-f'? The file is almost 4gb and the largest in the selection.

Or is my test (if -f) poorly chosen?

EDIT [trying to stat files]:

Big file fails

machine:~# perl -e 'use Data::Dumper; print Dumper(stat("/search/path/folder/folder/UNIQ/movie_file_015.MOV"));'

Small file works

machine:~# perl -e 'use Data::Dumper; print Dumper(stat("/search/path/folder/folder/UNIQ/movie_file_145.MOV"));'
$VAR1 = 65024;
$VAR2 = 19989500;
$VAR3 = 33204;
$VAR4 = 1;
$VAR5 = 1004;
$VAR6 = 100;
$VAR7 = 0;
$VAR8 = 141610616;
$VAR9 = 1349281585;
$VAR10 = 1338096718;
$VAR11 = 1352403842;
$VAR12 = 16384;
$VAR13 = 276736;

Binary 'stat' works on both files

machine:~# stat /search/path/folder/folder/UNIQ/movie_file_015.MOV
  File: "/search/path/folder/folder/UNIQ/movie_file_015.MOV"
  Size: 4171336632  Blocks: 8149216    IO Block: 16384  Regular File
Device: fe00h/65024d        Inode: 19989499    Links: 1
Access: (0664/-rw-rw-r--)  Uid: ( 1004/user)   Gid: (  100/   users)
Access: 2012-10-03 18:11:05.000000000 +0200
Modify: 2012-05-27 07:23:34.000000000 +0200
Change: 2012-11-08 20:44:02.000000000 +0100

machine:~# stat /search/path/folder/folder/UNIQ/movie_file_145.MOV
  File: "/search/path/folder/folder/UNIQ/movie_file_145.MOV"
  Size: 141610616   Blocks: 276736     IO Block: 16384  Regular File
Device: fe00h/65024d        Inode: 19989500    Links: 1
Access: (0664/-rw-rw-r--)  Uid: ( 1004/user)   Gid: (  100/   users)
Access: 2012-10-03 18:26:25.000000000 +0200
Modify: 2012-05-27 07:31:58.000000000 +0200
Change: 2012-11-08 20:44:02.000000000 +0100

Also:

machine:~# perl -e 'stat("/search/path/folder/folder/UNIQ/movie_file_145.MOV"); print $! . "\n";'
Bad file descriptor

machine:~# perl -e 'stat("/search/path/folder/folder/UNIQ/movie_file_015.MOV"); print $! . "\n";'
Value too large for defined data type

EDIT2:

# perl -V | grep "uselargefiles|FILE_OFFSET_BITS"
config_args='-Dccflags=-DDEBIAN -Dcccdlflags=-fPIC -Darchname=sparc-linux -Dprefix=/usr -Dprivlib=/usr/share/perl/5.8 -Darchlib=/usr/lib/perl/5.8 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.8.8 -Dsitearch=/usr/local/lib/perl/5.8.8 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man3 -Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Dstatic_ext=B ByteLoader GDBM_File POSIX re -Dusemymalloc -Uuselargefiles -Uafs -Ud_csh -Uusesfio -Uusenm -Duseshrplib -Dlibperl=libperl.so.5.8.8 -Dd_dosuid -des'
useperlio=define d_sfio=undef uselargefiles=undef usesocks=undef

Problem "solved":

machine:~# perl -e 'stat("/search/path/folder/folder/UNIQ/movie_file_015.MOV"); print $!{EOVERFLOW} . "\n";'
92
machine:~# perl -e 'stat("/search/path/folder/folder/UNIQ/movie_file_145.MOV"); print $!{EOVERFLOW} . "\n";'
0

Works:

# perl -e 'use File::Find; find(sub { print $File::Find::name . "\n" if -f or ( $!{EOVERFLOW} > 0 and not -d) }, "/search/path");' | grep UNIQ
/search/path/folder/folder/UNIQ/movie_file_015.MOV 
/search/path/folder/folder/UNIQ/movie_file_145.MOV 
/search/path/folder/folder/UNIQ/Thumbs.db
like image 474
azzid Avatar asked Dec 26 '12 11:12

azzid


1 Answers

Based on a bit of Googling, it looks like your perl interpreter has not been compiled with large file support, causing stat (and any file tests that rely on it internally, including -f) to fail for files larger than 2GB.

To check whether this is the case, run:

perl -V | grep "uselargefiles|FILE_OFFSET_BITS"

If your perl has large file support, the output should show something like uselargefiles=define and -D_FILE_OFFSET_BITS=64. If it doesn't, it's likely that you perl does not support large files.

It may be somewhat puzzling why large file support is needed even for just stating files. The underlying problem is that the 32-bit version of the stat(2) system call, rather than returning a bogus size, simply fails with EOVERFLOW if applied to a file larger than 2GB:

"EOVERFLOW

(stat()) path refers to a file whose size cannot be represented in the type off_t. This can occur when an application compiled on a 32-bit platform without -D_FILE_OFFSET_BITS=64 calls stat() on a file whose size exceeds (1<<31)-1 bits."

Technically, receiving that error should be enough to indicate that the named file does exist (although I guess it could be a truly humongous directory too), but perl is not smart enough to realize that — it just sees that the stat failed, and so returns nothing.

(Edit: As ikegami correctly notes in the comments, -f returns undef rather than 0 or 1 if the stat(2) call fails, and sets $! to the error code that caused the failure. So, if you don't mind assuming that all directory entries with size > 2GB are files, you could do something like -f $_ or (not defined -f _ and $!{EOVERFLOW}) to check for it.)

like image 77
Ilmari Karonen Avatar answered Oct 05 '22 15:10

Ilmari Karonen