Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading .xls file via PHPExcel throws Fatal error: allowed memory size... even with chunk reader

im using PHPExcel to read .xls files. I quite a short time i meet

Fatal error: Allowed memory size of 1073741824 bytes exhausted (tried to allocate 730624 bytes) in Excel\PHPExcel\Shared\OLERead.php on line 93

after some googling, i tried chunkReader to prevent this (mentioned even on PHPExcel homesite), but im still stucked with this error.

My thought is, that via chunk reader, i will read file part by part and my memory wont overflow. But there must be some serious memoryleak? Or im freeing some memory bad? I even tried to raise server ram to 1GB. File size, which i trying to read is about 700k, which is not so much (im also reading ~20MB pdf, xlsx, docx, doc, etc files without issue). So i assume there can be just some minor troll i overlooked.

Code looks like this

function parseXLS($fileName){
    require_once dirname(__FILE__) . './sphider_design/include/Excel/PHPExcel/IOFactory.php';
    require_once dirname(__FILE__) . './sphider_design/include/Excel/PHPExcel/ChunkReadFilter.php';

    $inputFileType = 'Excel5';

    /**  Create a new Reader of the type defined in $inputFileType  **/
    $objReader = PHPExcel_IOFactory::createReader($inputFileType);
    /**  Define how many rows we want to read for each "chunk"  **/ 
    $chunkSize = 20;
    /**  Create a new Instance of our Read Filter  **/ 
    $chunkFilter = new chunkReadFilter(); 
    /**  Tell the Reader that we want to use the Read Filter that we've Instantiated  **/ 
    $objReader->setReadFilter($chunkFilter); 

    /**  Loop to read our worksheet in "chunk size" blocks  **/ 
    /**  $startRow is set to 2 initially because we always read the headings in row #1  **/
    for ($startRow = 2; $startRow <= 65536; $startRow += $chunkSize) { 
        /**  Tell the Read Filter, the limits on which rows we want to read this iteration  **/ 
        $chunkFilter->setRows($startRow,$chunkSize); 
        /**  Load only the rows that match our filter from $inputFileName to a PHPExcel Object  **/ 
        $objPHPExcel = $objReader->load($fileName); 
        //    Do some processing here 

        //    Free up some of the memory 
        $objPHPExcel->disconnectWorksheets(); 
        unset($objPHPExcel); 
    }
}

And here is code for chunkReader

class chunkReadFilter implements PHPExcel_Reader_IReadFilter
{
    private $_startRow = 0;
    private $_endRow = 0;

    /**  Set the list of rows that we want to read  */ 
    public function setRows($startRow, $chunkSize) { 
        $this->_startRow    = $startRow; 
        $this->_endRow      = $startRow + $chunkSize;
    } 

    public function readCell($column, $row, $worksheetName = '') {
        //  Only read the heading row, and the rows that are configured in $this->_startRow and $this->_endRow 
        if (($row == 1) || ($row >= $this->_startRow && $row < $this->_endRow)) { 
           return true;
        }
        return false;
    } 
}
like image 201
Luboš Suk Avatar asked Apr 14 '16 08:04

Luboš Suk


2 Answers

So i found interesting solution here How to read large worksheets from large Excel files (27MB+) with PHPExcel?

as Addendum 3 in question

edit1: also with this solution, i came to chokepoint with my favourite errr message, but i found something about caching, so i implemented this

$cacheMethod = PHPExcel_CachedObjectStorageFactory::cache_to_phpTemp;
$cacheSettings = array(' memoryCacheSize ' => '8MB');
PHPExcel_Settings::setCacheStorageMethod($cacheMethod, $cacheSettings);

recently i tested it only for xls files lesser than 10MB, but it seems like to work (also i set $objReader->setReadDataOnly(true);) and it seems like balanced enough to achieve speed and memory consumption. (i will follow my thorny path more, if its possible)

edit2: So i made some further research and found chunk reader unnecessary in my way. (seems like to me, memory issue is same with chunk reader and without it.) So my final answer to my question is something like that, which reads .xls file (only data from cells, without formating, even filtering out formulas). When i use cache_tp_php_temp im able to read xls files (tested to 10MB) and about 10k rows and multiple columns in matter of seconds and without memory issue

function parseXLS($fileName){

/** PHPExcel_IOFactory */
    require_once dirname(__FILE__) . './sphider_design/include/Excel/PHPExcel/IOFactory.php';
    require_once dirname(__FILE__) . './sphider_design/include/Excel/PHPExcel/ChunkReadFilter.php';
    require_once dirname(__FILE__) . './sphider_design/include/Excel/PHPExcel.php';

    $inputFileName = $fileName;
    $fileContent = "";

    //get inputFileType (most of time Excel5)
    $inputFileType = PHPExcel_IOFactory::identify($inputFileName);

    //initialize cache, so the phpExcel will not throw memory overflow
    $cacheMethod = PHPExcel_CachedObjectStorageFactory::cache_to_phpTemp;
    $cacheSettings = array(' memoryCacheSize ' => '8MB');
    PHPExcel_Settings::setCacheStorageMethod($cacheMethod, $cacheSettings);

    //initialize object reader by file type
    $objReader = PHPExcel_IOFactory::createReader($inputFileType);

    //read only data (without formating) for memory and time performance
    $objReader->setReadDataOnly(true);

    //load file into PHPExcel object
    $objPHPExcel = $objReader->load($inputFileName);

    //get worksheetIterator, so we can loop sheets in workbook
    $worksheetIterator = $objPHPExcel->getWorksheetIterator();

    //loop all sheets
    foreach ($worksheetIterator as $worksheet) {    

            //use worksheet rowIterator, to get content of each row
            foreach ($worksheet->getRowIterator() as $row) {
                //use cell iterator, to get content of each cell in row
                $cellIterator = $row->getCellIterator();
                //dunno
                $cellIterator->setIterateOnlyExistingCells(false);      

                //iterate each cell
                foreach ($cellIterator as $cell) {
                    //check if cell exists
                    if (!is_null($cell)) {
                        //get raw value (without formating, and all unnecessary trash)
                        $rawValue = $cell->getValue();
                        //if cell isnt empty, print its value
                        if ((trim($rawValue) <> "") and (substr(trim($rawValue),0,1) <> "=")){
                            $fileContent .= $rawValue . " ";                                            
                        }
                    }
                }       
            }       
    }

    return $fileContent;
}
like image 136
Luboš Suk Avatar answered Oct 28 '22 17:10

Luboš Suk


Hope following links will help :

PHPExcel runs out of 256, 512 and also 1024MB of RAM

http://phpexcel.codeplex.com/discussions/242712?ProjectName=phpexcel

like image 35
Shivani P Avatar answered Oct 28 '22 18:10

Shivani P