Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Perl reading huge excel file

I have a huge xlsx file (aboutn 127 MB) and want to read using Spreadsheet::Excel module, but i am getting "Out of Memory" errors on 2GB RAM machine. (Note the script works fine with smaller excel 2007 files)

Is there any way to read the excel file line by line without hitting the memory limit.? searching google i came across http://discuss.joelonsoftware.com/default.asp?joel.3.160328.14 but i am not familar on how to store the spreadsheet into a scalar. Can someone gimme an example of reading excel 2007 files as scalar and printing cell values. Below is the current script i am running on smaller spreadsheets.

#!/usr/bin/perl
use Excel::Writer::XLSX;
use Spreadsheet::XLSX;
my $workbook  = Excel::Writer::XLSX->new('Book1.xlsx');
my $worksheet = $workbook->add_worksheet();
#  use strict;
my $excel = Spreadsheet::XLSX -> new ('Book2.xlsx');
my $date_format = $workbook->add_format();
$date_format->set_num_format('dd/mm/yy hh:mm');
# Columns of interest
@columns=(0,1,2,5,9,10,12,13,31);
@reportlist=("string1","String2","String3");
@actuallist=("ModifiedString1","ModifiedString2","ModifiedString3");
$max_list=$#reportlist;
foreach my $sheet (@{$excel -> {Worksheet}}) {
    printf("Sheet: %s\n", $sheet->{Name});
    $sheet -> {MaxRow} ||= $sheet -> {MinRow};
        foreach my $row ($sheet -> {MinRow} .. $sheet -> {MaxRow}) {
            $sheet -> {MaxCol} ||= $sheet -> {MinCol};
            for ($c=0;$c<=$#columns;$c++){
                $col=$columns[$c];
                my $cell = $sheet -> {Cells} [$row] [$col];
                    if($col==0){
                    $cell->{Val}=~ s/\ GMT\+11\:00//g;
                    $worksheet->write($row,$c,$cell->{Val},$date_format);
                    }
                    if ($cell) {
                        $worksheet->write($row,$c,$cell -> {Val});
                            for($z=0;$z<=$#reportisplist;$z++){
                                if(($cell->{Val})=~ m/$reportlist[$z]/i){
                                $worksheet->write($row,$c,$actuallist[$z]);
                                }
                            }
                    }
            }
        }
}
$workbook->close();
like image 806
Linus Avatar asked Jan 20 '23 08:01

Linus


1 Answers

I'm working on a new module for fast and memory efficient reading of Excel xlsx files with Perl. It isn't on CPAN yet (it needs a good bit more work) but you can get it on GitHub.

Here is a example of how to use it:

use strict;
use warnings;
use Excel::Reader::XLSX;

my $reader   = Excel::Reader::XLSX->new();
my $workbook = $reader->read_file( 'Book1.xlsx' );

if ( !defined $workbook ) {
    die $reader->error(), "\n";
}

for my $worksheet ( $workbook->worksheets() ) {

    my $sheetname = $worksheet->name();

    print "Sheet = $sheetname\n";

    while ( my $row = $worksheet->next_row() ) {

        while ( my $cell = $row->next_cell() ) {

            my $row   = $cell->row();
            my $col   = $cell->col();
            my $value = $cell->value();

            print "  Cell ($row, $col) = $value\n";
        }
    }
}

__END__

Update: This module never made it to CPAN quality. Try Spreadsheet::ParseXLSX instead.

like image 115
jmcnamara Avatar answered Jan 30 '23 09:01

jmcnamara