Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Recursive directory iterator with offset

Is it possible to start the loop from a certain point?

$iterator = new \RecursiveIteratorIterator(new \RecursiveDirectoryIterator($path, $flags));

$startTime = microtime(true); 
foreach($iterator as $pathName => $file){

  // file processing here

  // after 5 seconds stop and continue in the next request
  $elapsedSecs = (microtime(true) - $startTime);
  if($elapsedSecs > 5)
     break;
}

But how do I resume from my break point in the next request?

like image 649
nice ass Avatar asked Dec 18 '14 13:12

nice ass


1 Answers

a) pull the time calculation out of the foreach. you have a start time and you want a runtime of 5 seconds, so you might calculate the endtime beforehand (startime+5s). inside the foreach, simply compare if time is greater or equal to endtime, then break.

b) Q: is it possible to start the loop from a certain point? how do I resume from my break point in the next request?

Two approaches come to my mind.

You could store the last processing point and the iterator and resume at last point + 1. You would save the last position of the iteration and fast forward to it on the next request, by calling iterator->next() until you reach the next item to process, which is $lastPosition+1. we have to store the iterator and the lastPosition and pick both up on the next request, until lastPosition equals the total number of elements in the iterator.

Or, you could turn the iterator into an array on the first run: $array = iterator_to_array($iterator); and then use a reduce array approach. (Maybe someone else knows how to reduce an iterator object.) With this approach you would only store the data, which decreases request by request until 0.

The code is untested. It's just a quick draft.

$starttime = time();
$endtime = $starttime + (5 * 60); // 5sec
$totalElements = count($array);

for($i = 0; $i <= $totalElements; $i++) 
{
    if(time() >= $endtime) {
        break;
    }

    doStuffWith($array[$i]);
}

echo 'Processed ' . $i . ' elements in 5 seconds';

// exit condition is "totalElements to process = 0"
// greater 1 means there is more work to do
if( ($totalElements - $i) >= 1) {

    // chop off all the processed items from the inital array
    // and build the array for the next processing request
    $reduced_array = array_slice(array, $i);

    // save the reduced array to cache, session, disk    
    store($reduced_array);
} else {
    echo 'Done.';
}

// on the next request, load the array and resume the steps above...

All in all, this is batch processing and might be done more efficiently by a worker/job-queue, like:

  • Gearman (See the PHP manual has some Gearman examples.) or
  • RabbitMQ / AMPQ or
  • the PHP libs listed here: https://github.com/ziadoz/awesome-php#queue.
like image 69
Jens A. Koch Avatar answered Oct 15 '22 17:10

Jens A. Koch