Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

PHPExcel runs out of 256, 512 and also 1024MB of RAM

Tags:

php

phpexcel

I don't understand it. The XSLX table is about 3MB large yet even 1024MB of RAM is not enough for PHPExcel to load it into memory?

I might be doing something horribly wrong here:

function ReadXlsxTableIntoArray($theFilePath)
{
    require_once('PHPExcel/Classes/PHPExcel.php');
    $inputFileType = 'Excel2007';
    $objReader = PHPExcel_IOFactory::createReader($inputFileType);
    $objReader->setReadDataOnly(true);
    $objPHPExcel = $objReader->load($theFilePath);
    $rowIterator = $objPHPExcel->getActiveSheet()->getRowIterator();
    $arrayData = $arrayOriginalColumnNames = $arrayColumnNames = array();
    foreach($rowIterator as $row){
        $cellIterator = $row->getCellIterator();
        $cellIterator->setIterateOnlyExistingCells(false); // Loop all cells, even if it is not set
        if(1 == $row->getRowIndex ()) {
            foreach ($cellIterator as $cell) {
                $value = $cell->getCalculatedValue();
                $arrayOriginalColumnNames[] = $value;
                // let's remove the diacritique
                $value = iconv('UTF-8', 'ISO-8859-1//TRANSLIT', $value);
                // and white spaces
                $valueExploded = explode(' ', $value);
                $value = '';
                // capitalize the first letter of each word
                foreach ($valueExploded as $word) {
                    $value .= ucfirst($word);
                }
                $arrayColumnNames[] = $value;
            }
            continue;
        } else {
            $rowIndex = $row->getRowIndex();
            reset($arrayColumnNames);
            foreach ($cellIterator as $cell) {
                $arrayData[$rowIndex][current($arrayColumnNames)] = $cell->getCalculatedValue();
                next($arrayColumnNames);
            }
        }
    }
    return array($arrayOriginalColumnNames, $arrayColumnNames, $arrayData);
}

The function above reads data from an excel table to an array.

Any suggestions?

At first, I allowed PHP to use 256MB of RAM. It was not enough. I then doubled the amount and then also tried 1024MB. It still runs out of memory with this error:

Fatal error: Allowed memory size of 1073741824 bytes exhausted (tried to allocate 50331648 bytes) in D:\data\o\WebLibThirdParty\src\PHPExcel\Classes\PHPExcel\Reader\Excel2007.php on line 688

Fatal error (shutdown): Allowed memory size of 1073741824 bytes exhausted (tried to allocate 50331648 bytes) in D:\data\o\WebLibThirdParty\src\PHPExcel\Classes\PHPExcel\Reader\Excel2007.php on line 688
like image 334
Richard Knop Avatar asked Jan 27 '11 14:01

Richard Knop


7 Answers

There's plenty been written about the memory usage of PHPExcel on the PHPExcel forum; so reading through some of those previous discussions might give you a few ideas. PHPExcel holds an "in memory" representation of a spreadsheet, and is susceptible to PHP memory limitations.

The physical size of the file is largely irrelevant... it's much more important to know how many cells (rows*columns on each worksheet) it contains.

The "rule of thumb" that I've always used is an average of about 1k/cell, so a 5M cell workbook is going to require 5GB of memory. However, there are a number of ways that you can reduce that requirement. These can be combined, depending on exactly what information you need to access within your workbook, and what you want to do with it.

If you have multiple worksheets, but don't need to load all of them, then you can limit the worksheets that the Reader will load using the setLoadSheetsOnly() method. To load a single named worksheet:

$inputFileType = 'Excel5'; 
$inputFileName = './sampleData/example1.xls';
$sheetname = 'Data Sheet #2'; 
/**  Create a new Reader of the type defined in $inputFileType  **/
$objReader = PHPExcel_IOFactory::createReader($inputFileType);
/**  Advise the Reader of which WorkSheets we want to load  **/ 
$objReader->setLoadSheetsOnly($sheetname); 
/**  Load $inputFileName to a PHPExcel Object  **/
$objPHPExcel = $objReader->load($inputFileName);

Or you can specify several worksheets with one call to setLoadSheetsOnly() by passing an array of names:

$inputFileType = 'Excel5'; 
$inputFileName = './sampleData/example1.xls';
$sheetnames = array('Data Sheet #1','Data Sheet #3'); 
/** Create a new Reader of the type defined in $inputFileType **/ 
$objReader = PHPExcel_IOFactory::createReader($inputFileType);
/** Advise the Reader of which WorkSheets we want to load **/ 
$objReader->setLoadSheetsOnly($sheetnames); 
/**  Load $inputFileName to a PHPExcel Object  **/
$objPHPExcel = $objReader->load($inputFileName);

If you only need to access part of a worksheet, then you can define a Read Filter to identify just which cells you actually want to load:

$inputFileType = 'Excel5'; 
$inputFileName = './sampleData/example1.xls';
$sheetname = 'Data Sheet #3'; 

/**  Define a Read Filter class implementing PHPExcel_Reader_IReadFilter  */ 
class MyReadFilter implements PHPExcel_Reader_IReadFilter {
    public function readCell($column, $row, $worksheetName = '') {
        //  Read rows 1 to 7 and columns A to E only 
        if ($row >= 1 && $row <= 7) {
           if (in_array($column,range('A','E'))) { 
              return true;
           }
        } 
        return false;
    }
}

/**  Create an Instance of our Read Filter  **/ 
$filterSubset = new MyReadFilter(); 
/** Create a new Reader of the type defined in $inputFileType **/ 
$objReader = PHPExcel_IOFactory::createReader($inputFileType);
/**  Advise the Reader of which WorkSheets we want to load 
     It's more efficient to limit sheet loading in this manner rather than coding it into a Read Filter  **/ 
$objReader->setLoadSheetsOnly($sheetname); 
echo 'Loading Sheet using filter';
/**  Tell the Reader that we want to use the Read Filter that we've Instantiated  **/ 
$objReader->setReadFilter($filterSubset); 
/**  Load only the rows and columns that match our filter from $inputFileName to a PHPExcel Object  **/
$objPHPExcel = $objReader->load($inputFileName);

Using read filters, you can also read a workbook in "chunks", so that only a single chunk is memory-resident at any one time:

$inputFileType = 'Excel5'; 
$inputFileName = './sampleData/example2.xls';

/**  Define a Read Filter class implementing PHPExcel_Reader_IReadFilter  */ 
class chunkReadFilter implements PHPExcel_Reader_IReadFilter {
    private $_startRow = 0;
    private $_endRow = 0;

    /**  Set the list of rows that we want to read  */ 
    public function setRows($startRow, $chunkSize) { 
        $this->_startRow    = $startRow; 
        $this->_endRow      = $startRow + $chunkSize;
    } 

    public function readCell($column, $row, $worksheetName = '') {
        //  Only read the heading row, and the rows that are configured in $this->_startRow and $this->_endRow 
        if (($row == 1) || ($row >= $this->_startRow && $row < $this->_endRow)) { 
           return true;
        }
        return false;
    } 
}

/**  Create a new Reader of the type defined in $inputFileType  **/
$objReader = PHPExcel_IOFactory::createReader($inputFileType);
/**  Define how many rows we want to read for each "chunk"  **/ 
$chunkSize = 20;
/**  Create a new Instance of our Read Filter  **/ 
$chunkFilter = new chunkReadFilter(); 
/**  Tell the Reader that we want to use the Read Filter that we've Instantiated  **/ 
$objReader->setReadFilter($chunkFilter); 

/**  Loop to read our worksheet in "chunk size" blocks  **/ 
/**  $startRow is set to 2 initially because we always read the headings in row #1  **/
for ($startRow = 2; $startRow <= 65536; $startRow += $chunkSize) { 
    /**  Tell the Read Filter, the limits on which rows we want to read this iteration  **/ 
    $chunkFilter->setRows($startRow,$chunkSize); 
    /**  Load only the rows that match our filter from $inputFileName to a PHPExcel Object  **/ 
    $objPHPExcel = $objReader->load($inputFileName); 
    //    Do some processing here 

    //    Free up some of the memory 
    $objPHPExcel->disconnectWorksheets(); 
    unset($objPHPExcel); 
}

If you don't need to load formatting information, but only the worksheet data, then the setReadDataOnly() method will tell the reader only to load cell values, ignoring any cell formatting:

$inputFileType = 'Excel5';
$inputFileName = './sampleData/example1.xls';
/** Create a new Reader of the type defined in $inputFileType **/ 
$objReader = PHPExcel_IOFactory::createReader($inputFileType);
/** Advise the Reader that we only want to load cell data, not formatting **/ 
$objReader->setReadDataOnly(true);
/**  Load $inputFileName to a PHPExcel Object  **/
$objPHPExcel = $objReader->load($inputFileName);

Use cell caching. This is a method for reducing the PHP memory that is required for each cell, but at a cost in speed. It works by storing the cell objects in a compressed format, or outside of PHP's memory (eg. disk, APC, memcache)... but the more memory you save, the slower your scripts will execute. You can, however, reduce the memory required by each cell to about 300bytes, so the hypothetical 5M cells would require about 1.4GB of PHP memory.

Cell caching is described in section 4.2.1 of the Developer Documentation

EDIT

Looking at your code, you're using the iterators, which aren't particularly efficient, and building up an array of cell data. You might want to look at the toArray() method, which is already built into PHPExcel, and does this for you. Also take a look at this recent discussion on SO about the new variant method rangeToArray() to build an associative array of row data.

like image 70
Mark Baker Avatar answered Nov 18 '22 06:11

Mark Baker


I had the same memory issue problem with PHPExcel and actually all the other libraries. Reading the data in chunks, as Mark Baker suggested could fix the issue (caching works too), but it turned out that the memory issue became a time issue. The reading and writing time was exponential so for large spreadsheets, it was not a good fit.

PHPExcel and others are not meant to handle large files so I created a library that solves this problem. You can check it out here: https://github.com/box/spout

Hope that helps!

like image 29
Adrien Avatar answered Nov 18 '22 07:11

Adrien


There are plenty of measures you can take to reserve less memory when working with PHPExcel. I recommend you to take the following actions to optimize memory usage before modifying your server's memory limit in Apache.

/* Use the setReadDataOnly(true);*/
    $objReader->setReadDataOnly(true);

/*Load only Specific Sheets*/
    $objReader->setLoadSheetsOnly( array("1", "6", "6-1", "6-2", "6-3", "6-4", "6-5", "6-6", "6-7", "6-8") );

/*Free memory when you are done with a file*/
$objPHPExcel->disconnectWorksheets();
   unset($objPHPExcel);

Avoid using very large Exel files, remember it is the file size that makes the process run slowly and crash.

Avoid using the getCalculatedValue(); function when reading cells.

like image 5
pancy1 Avatar answered Nov 18 '22 08:11

pancy1


Ypu can try PHP Excel http://ilia.ws/archives/237-PHP-Excel-Extension-0.9.1.html Its an C extension for php and its very fast. (Also uses less memory than PHP implementations)

like image 2
osm Avatar answered Nov 18 '22 07:11

osm


In my case, phpexcel always iterated through 19999 rows. no matter, how many rows actually were filled. So 100 rows of data always ended up in a memory error.

Perhaps you just have to check, if the cells in the current row are empty and then "continue" oder break the loop, that iterates the rows.

like image 1
Robert Avatar answered Nov 18 '22 08:11

Robert


Just reposting my post from another thread. It describes different approach to serverside generating or editing of Excel spreadsheets that should be taken in account. For large amount of data I would not recommend tools like PHPExcel or ApachePOI (for Java) because of their memory requirements. There is another quite convenient (although maybe little bit fiddly) way to inject data into spreadsheets. Serverside generation or updating of Excel spreadsheets can be achieved thus simple XML editing. You can have XLSX spreadsheet sitting on the server and every time data is gathered from dB, you unzip it using php. Then you access specific XML files that are holding contents of worksheets that need to be injected and insert data manually. Afterwards, you compress spreadsheet folder in order to distribute it as an regular XLSX file. Whole process is quite fast and reliable. Obviously, there are few issues and glitches related to inner organisation of XLSX/Open XML file (e. g. Excel tend to store all strings in separate table and use references to this table in worksheet files). But when injecting only data like numbers and strings, it is not that hard. If anyone is interested, I can provide some code.

like image 1
bazinac Avatar answered Nov 18 '22 07:11

bazinac


I ran into this problem and unfortunately none of the suggested solutions could help me. I need the functionality that PHPExcel provides (formulas, conditional styling, etc) so using a different library was not an option.

What I eventually did was writing each worksheet to an individual (temporary) file, and then combining these separate files with some special software I wrote. This reduced my memory consumption from >512 Mb to well under 100 Mb. See https://github.com/infostreams/excel-merge if you have the same problem.

like image 1
Edward Avatar answered Nov 18 '22 08:11

Edward