Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHPExcel exceeds memory usage only with XLS files

I'm having a memory issue with PHPExcel when processing an XLS file. I have to work with quite big files (between 50k and 200k rows, and 9-10 cols) so I had to use ReadFilters to solve the memory issue.

However, although it worked pretty well with XLSX files, using a mix of background processes and some simple calculations for the chunk size, I fail to make it work with XLS files.

This is the piece of code where everything explodes:

Class ExcelReadFilter implements PHPExcel_Reader_IReadFilter
{
    private $startRow = 0;
    private $endRow = 0;

    public function setRows($startRow, $chunkSize) {
        $this->startRow    = $startRow;
        $this->endRow        = $startRow + $chunkSize;
    }

    public function readCell($column, $row, $worksheetName = '') {
        if ( ($row >= $this->startRow && $row < $this->endRow) ) {
            return true;
        }
        return false;
    }
}

PHPExcel_Settings::setCacheStorageMethod( PHPExcel_CachedObjectStorageFactory::cache_in_memory_serialized );

....
$filter = new ExcelReadFilter();
$filter->setRows($desde, $cuantas);

$reader = PHPExcel_IOFactory::createReader($this->file_type);   

$reader->setLoadSheetsOnly($sheet_name);    
$reader->setReadDataOnly(false);    
$reader->setReadFilter($filter);

$chunk = $reader->load($this->file);
$chunk->setActiveSheetIndexByName($sheet_name);

$active_sheet = $chunk->getActiveSheet();
$rowIterator = $active_sheet->getRowIterator($desde);
$this->num_filas = $active_sheet->getHighestRow();

Immediately after that, I included the following lines to have a better idea of what was happening:

ob_start();
var_dump($rowIterator);

$f = fopen("excel-info.txt", "w");
fwrite($f, ob_get_clean());
fclose($f);

ob_end_clean();
die;

And I think that it pointed to where the memory issue is. When I first uploaded the original XLS file, excel-info.txt had a size of 13M. Then I opened the XLS file and saved it as XLSX, and repeated the process, after which, excel-info.txt was only 285k.

Is there any way I can modify those filters to work with XLS files?

Oh, and setting the PHP memory limit to a higher value is not an option, but execution time is not critical.

ADDED

When I used different Memory Caching Options, I was able to reduce the memory usage enough to make it work, and kept it in most cases to an acceptable size.

Right now, I'm using PHPExcel_CachedObjectStorageFactory::cache_to_sqlite, and it seems to be enough to make it work.

I'd like to remark that the calculation I made over serialized information put in a file was incorrect. Excel5 files produce an array with as many records as rows has the excel file having all of its values that not meets the filter condition set to NULL. Of course, when I save it to a text file, having something like...

array(10) {
  ["A"]=>
  NULL
  ["B"]=>
  NULL
  ["C"]=>
  NULL
  ["D"]=>
  NULL
  ["E"]=>
  NULL
  ["F"]=>
  NULL
  ["G"]=>
  NULL
  ["H"]=>
  NULL
  ["I"]=>
  NULL
  ["J"]=>
  NULL
} 

...takes a lot of space in the file, but should not in php_memory, so that was my fault.

Now, I'm using this code to keep track of memory usage:

for ( $i=1; $i < 20000; $i+=5000 ){
        $filter->setRows($i, 5000);
        echo "\n1- Usage: ".(memory_get_usage ()/1024)/1024;
        $objPHPExcel = $reader->load($this->file);
        echo "\n2- Usage: ".(memory_get_usage ()/1024)/1024;
        $sheetData = $objPHPExcel->getActiveSheet()->toArray(null,true,true,true);
        unset($sheetData);
        unset($objPHPExcel);
        echo "\n3- Usage: ".(memory_get_usage ()/1024)/1024;
        }

With a certain XLS file, it shows:

1st Iteration 1- Usage: 4.3859634399414 2- Usage: 34.292671203613 3- Usage: 34.68034362793

2nd Iteration 1- Usage: 34.68034362793 2- Usage: 34.68293762207 3- Usage: 34.684982299805

And the same file, after saving as XLSX:

1st Iteration 1- Usage: 4.2780990600586 2- Usage: 6.9042129516602 3- Usage: 7.2916641235352

2nd Iteration 1- Usage: 7.2916641235352 2- Usage: 7.5115432739258 3- Usage: 7.2813568115234

I must say, however, that after saving as XLSX the size is reduced by approximately half, thus I cannot say whether it's a bug or it's expected behavior.

like image 239
sergio0983 Avatar asked Sep 20 '14 17:09

sergio0983


1 Answers

PHPExcel is a memory hog. I have used it for several clients, and found that you have to experiment with the php memory limit setting to find the sweet spot where it can load the average file the client is likely to import. I have had to use as much as 8 GB on some projects. Of course, you do that in the routine that loads the xls file using ini_set('memory_limit','16M'), not in the php.ini file.

Have you tried setReadDataOnly(true)?

I think the reason is that xls files are not just csv data, but a catch-all for lots of other information (like fonts and macros). When you load the file, PHPExcel tries to load all the parts into memory, creating a huge structure.

like image 140
TomTerrific Avatar answered Sep 21 '22 15:09

TomTerrific