Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can my Perl script determine whether an Excel file is in XLS or XLSX format?

I have a Perl script that reads data from an Excel (xls) binary file. But the client that sends us these files has started sending us XLSX format files at times. I've updated the script to be able to read those as well. However, the client sometimes likes to name the XLSX files with an .xls extension, which currently confuses the heck outta my script since it uses the file name to determine which file type it is.

An XLSX file is a zip file that contains XML stuff. Is there a simple way for my script to look at the file and tell whether it's a zip file or not? If so, I can make my script go by that instead of just the file name.

like image 287
DaveKub Avatar asked Oct 27 '10 18:10

DaveKub


4 Answers

Yes, it is possible by checking magic number.

There are quite a few modules in Perl for checking magic number in a file.

An example using File::LibMagic:

use strict;
use warnings;

use File::LibMagic;

my $lm = File::LibMagic->new();

if ( $lm->checktype_filename($filename) eq 'application/zip; charset=binary' ) {
    # XLSX format
}
elsif ( $lm->checktype_filename($filename) eq 'application/vnd.ms-office; charset=binary' ) {
    # XLS format
}

Another example, using File::Type:

use strict;
use warnings;

use File::Type;

my $ft = File::Type->new();

if ( $ft->mime_type($file) eq 'application/zip' ) {
    # XLSX format
}
else {
    # probably XLS format
}
like image 171
Alan Haggai Alavi Avatar answered Sep 28 '22 19:09

Alan Haggai Alavi


.xlsx files have the first 2 bytes as 'PK', so a simple open and examination of the first 2 characters will do.

like image 36
Bruce Armstrong Avatar answered Sep 28 '22 19:09

Bruce Armstrong


Edit: Archive::Zip is a better

solution
 # Read a Zip file
   my $somezip = Archive::Zip->new();
   unless ( $somezip->read( 'someZip.zip' ) == AZ_OK ) {
       die 'read error';
   }
like image 37
weismat Avatar answered Sep 28 '22 19:09

weismat


Use File::Type:

my $file = "foo.zip";
my $filetype = File::Type->new( );

if( $filetype->mime_type( $file ) eq 'application/zip' ) {
  # File is a zip archive.
  ...
}

I just tested it with a .xlsx file, and the mime_type() returned application/zip. Similarly, for a .xls file the mime_type() is application/octet-stream.

like image 20
CanSpice Avatar answered Sep 28 '22 20:09

CanSpice