Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Looping through large data array in PHP

I have an array with 100,000 users personal info in (ID, name, email etc). I need to loop through each row of the array and insert a mysql record to a table based on the row data. My problem is that I am running out of memory after about 70,000 rows.

My code:

if(!empty($users)){
    $c = 0;
        foreach($users as $user){

            $message = // Some code to create custom email
            queue_mail_to_send($user->user_email, $subject, $message, $db_options, $mail_options, $mail_queue);
        }
}

Background:

I am building an email system which sends out an email to the users of my site. The code above is looping through the array of users and executing the function 'queue_mail_to_send' which inserts a mysql row into a email queue table. (I am using a PEAR library to stagger the email sending)

Question:

I know that I am simply exhausting the memory here by trying to do too much in one execution. So does anybody know a better approach to this rather than trying to execute everything in one big loop?

Thanks

like image 783
el_nariz Avatar asked Apr 15 '14 09:04

el_nariz


2 Answers

I think reducing the payload of the script will be cumbersome and will not give you a satisfying result. If you have any possibility to do so, I would advise you to log which rows you have processed already, and have a script run the next x rows. If you can use a cronjob, you can stage a mail, and let the cronjob add mails to the queue every 5 minutes, until all users are processed.

The easiest way would be to store somewhere, the highest user id you have processed. I would not advise you to store the number of users, because in between batches a user can be added or removed, resulting in users not receiving the e-mail. But if you order by user id (assuming you use an auto-incrementing column for the id!), you can be sure every user gets processed.

So your user query would be something like:

SELECT * FROM users WHERE user_id > [highest_processed_user_id] ORDER BY user_id LIMIT 1000

Then process your loop, and store the last user id:

if(!empty($users)) {
    $last_processed_id = null;
    foreach($users as $user) {
        $message = // Message creation magic
        queue_mail_to_send( /** parameters **/ );
        $last_processed_id = $user->id;
    }

    // batch done! store processed user id
    $query = 'UPDATE mail_table SET last_processed_user_id = '. $last_processed_id; // please use parameterized statements here
    // execute the query
}

And on the next execution, do it again until all users have received the mail.

like image 101
giorgio Avatar answered Sep 19 '22 12:09

giorgio


I have exactly same problem with you. Anyway the answer from @giorgio is the best solutions.

But like java or python, we have "yield" in php. @see [here] (http://php.net/manual/en/language.generators.syntax.php)

Here is my sample code, my case is 50.000 records. and I also test successfully with 370.000 records. But it takes times.

$items = CustomerService::findAll();
        foreach ($items AS $item)
        {
            yield (new self())->loadFromResource($item);
        }
like image 21
ThangTD Avatar answered Sep 20 '22 12:09

ThangTD