I have an 18M Excel spreadsheet to parse and Spreadsheet::ParseExcel
was consuming so much memory that it I had to switch to Spreadsheet::ParseExcel::Stream. It works fine on my VM, it works fine on our staging server, but on our production server (configured the same way), I get this error:
Can't call method "transfer" on an undefined value at \
lib/Spreadsheet/ParseExcel/Stream/XLS.pm line 31.
That comes from the following bit of code:
my ($wb, $idx, $row, $col, $cell);
my $tmp = my $handler = sub {
($wb, $idx, $row, $col, $cell) = @_;
$parser->transfer($main); XXX here's where we die
};
my $tmp_p = $parser = Coro::State->new(sub {
$xls->Parse($file);
# Flag the generator that we're done
undef $xls;
# If we don't transfer back when done parsing,
# it's an implicit program exit (oops!)
$parser->transfer($main)
});
weaken($parser);
The weaken
looked suspicious, so I tried not weakening unless the refcount was greater than 1, but the same problem happens. I instrumented the code to get a stacktrace and got this:
parser is undefined at lib/Spreadsheet/ParseExcel/Stream/XLS.pm line 29.
Spreadsheet::ParseExcel::Stream::XLS::__ANON__ \
('Spreadsheet::ParseExcel::Workbook=HASH(0x6cd4a08)', 0, 2, 1, \
'Spreadsheet::ParseExcel::Cell=HASH(0x1387ce78)') called at \
/usr/share/perl5/Spreadsheet/ParseExcel.pm line 2152
Spreadsheet::ParseExcel::_NewCell( \
'Spreadsheet::ParseExcel::Workbook=HASH(0x6cd4a08)', 2, 1, \
'Kind', 'PackedIdx', 'Val', 'Dean', 'FormatNo', 25, ...) \
called at /usr/share/perl5/Spreadsheet/ParseExcel.pm line 896
Spreadsheet::ParseExcel::_subLabelSST( \
'Spreadsheet::ParseExcel::Workbook=HASH(0x6cd4a08)', 253, 10, \
'\x{2}\x{0}\x{1}\x{0}\x{19}\x{0}2\x{0}\x{0}\x{0}') \
called at /usr/share/perl5/Spreadsheet/ParseExcel.pm line 292
Spreadsheet::ParseExcel::parse( \
'Spreadsheet::ParseExcel=HASH(0x6cd1810)', '2013-09-13.xls') \
called at lib/Spreadsheet/ParseExcel/Stream/XLS.pm line 35
Spreadsheet::ParseExcel::Stream::XLS::__ANON__ \
called at new_importer.pl line 0
That tells me that the parser read the first and second rows, but it dies on the third row for some reason.
I've tried rebuilding Spreadsheet::ParseExcel::Stream
and it doesn't appear to have any errors (all tests pass). I've also recompiled Coro
(same result).
I'm mystified. Anyone have any ideas?
The problem turned out to be rather strange and looked like this psuedo code:
stream1 = open first excel stream
sheet1 = stream1.sheet // get spreadsheet ready for reading
if in verbose mode:
stream2 = open second excel stream
sheet2 = stream2.sheet
count++ while sheet2.get_row
say "We have $count records"
We discovered that if and only if we were in verbose mode would this problem manifest. By having two streams pointing to the same document, our production code would fail, though this worked fine on other boxes. By counting the number of rows and closing that stream before opening the regular stream for reading the document, we solved the problem.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With