Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

CakePHP recommendation to iterate a huge table and generate a sitemap?

I'm trying to create an XML sitemap using CakePHP, from a table which has more than 50,000 records at the moment, each record equivalent to a URI in the sitemap. Now the problem I'm facing is CakePHP is running me out of memory while generating it, for two reasons:

  1. A find('all') is building a huge associative array of the entire set of 50,000 URIs.
  2. Since I don't want to output HTML from the controller itself, I'm transferring the associative array containing URI, priority, change frequency etc., to the view with a $this->set() call -- which again is huge, containing 50,000 indices.

Is it possible at all, to do this while following MVC and CakePHP guidelines?

like image 501
Alex J Avatar asked Mar 03 '10 06:03

Alex J


3 Answers

I know this question is old, but for really huge queries there is still no good solution i think.

To iterate through a huge resultset you can use DboSource methods.

First get the DBO

$dbo = $this->Model->getDataSource();

Build the query

$sql = $dbo->buildStatement($options);

Then execute the statement and iterate through the results

if ($dbo->execute($sql))
{
    while ($dbo->hasResult() && $row = $dbo->fetchResult()) {
        // $row is an array with same structure like find('first')
    }
}
like image 176
Gerd K Avatar answered Nov 09 '22 22:11

Gerd K


I had a similar problem this week, and stumbled across the Containable Behavior. This allows you to cut down any relationship related queries (if you have any).

The best solution would be to programmatically use LIMIT and OFFSET, and loop through the recordset small chunks at a time. This saves you from stuffing 50K records into memory at once.

like image 3
bojo Avatar answered Nov 09 '22 22:11

bojo


Are you sure you have to run out of memory on 50.000 records? Even if a row is 1K in size (pretty huge), you would have to deal with ~ 50 MB of data? My P1 had enough RAM to handle that. Set memory_limit in php.ini higher than the default. (Consider also tweaking max_execution_time.)

On the other hand, if you consider the data set as too huge and processing it as too resource intensive, you should not serve that page dynamically, it is the perfect DDoS bait. (At least I would cache it heavily.) You could schedule a cron job to re-generate the page every X hours by a server side script free from the MVC penalty of serving all data at once to the view, it could work on the rows sequentially.

like image 2
sibidiba Avatar answered Nov 09 '22 22:11

sibidiba