Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Laravel: Working a large number of tasks into a queue

I'm building a web application using Laravel 5 that creates links to the web application that when visited show a form for a student's progress report. These links are sent by the web application to an email of a contact at the institution the student attends in order for the recipient to complete the progress report accessed by the link in the email.

The problem I face is when creating and sending links. I have some code which works fine with a couple of hundred students, however in real world use the application would be potentially creating and sending 3000+ or so links at one time. The code I've written simply can't handle such a large number in a timely manner and the application crashes. Strangely enough though I don't receive any timeout error via laravel (I'll need to double check the php logs).

Although I'm more than welcome to other suggestions, I believe the answer to the problem is utilising queues. I have already used queues when sending email (see code), but I would like to work some other parts of the code into queues, but I'm a bit unsure how to do this!

Brief database schema

Student hasMany Link

Student hasMany InstitutionContact (limited to two by my application)

Link hasMany InstitutionContact (limited to two by my application)

Email manyToMany Link

What I am trying to accomplish

  • Get all the Student's that require a new Link

  • Create a Link for each Student

  • Assign the Student's current InstitutionContacts to the Link's InstitutionContact (A Student's institution contact can change, so I link the InstitutionContact to the link if needed to resend.

  • Loop through all the newly created Links in order to group them together by shared InstitutionContacts - this is so an email is not sent per link (thus possibly sending many emails with one link to the same address), rather links should be grouped together by the same email/contact and sent together where applicable

  • Loop through all the Links grouped by email/contact and:

    1. Send an email including the Link's info (url, student name etc) to the designated InstitutionContact's email address
    2. Write a copy of the Email to the database
    3. Join the Email created in the former step to the Link(s) that were sent in it (so the application can be used to search what link was sent in which email)

So the main challenge I'm facing is performing the aforementioned task with a large dataset. I have already considered creating and sending a Link one by one via a queue, however this wouldn't allow me to group all the Links together by contact/email. As the task wouldn't be performed regularly I would be open to consider performing the task as it is with an increase in memory and time for the process, however I didn't have much success when attempting this using set_time_limit(0); and ini_set('memory_limit','1056M'); before sending any links.

Any help would be really appreciated, thank you if you read this far!

Code

app\Http\Controllers\LinkController.php

public function storeAndSendMass(Request $request)
{
    $this->validate($request, [
        'student_id' => 'required|array',
        'subject'    => 'required|max:255',
        'body'       => 'required|max:5000',
    ]);

    $studentIds = $request->get('student_id');
    $subject    = $request->get('subject');
    $body       = $request->get('body');

    $students = $this->student
        ->with('institutionContacts')
        ->whereIn('id', $studentIds)
        ->where('is_active', 1)
        ->get();

    // create link, see Link.php below for method
    $newLinks = $this->link->createActiveLink($students);

    // send link to student's contact(s), see LinkEmailer.php below for method
    $this->linkEmailer->send($newLinks, ['subject' => $subject, 'body' => $body], 'mass');

    // return
    return response()->json([
        'message' => 'Creating and sending links'
    ]);
}

app\Models\Link.php

public function createActiveLink($students)
{
    $links = [];

    foreach ($students as $student) {
        $newLink = $this->create([
            'token'          => $student->id, // automatically hashed
            'status'         => 'active',
            'sacb_refno'     => $student->sacb_refno,
            'course_title'   => $student->course_title,
            'university_id'  => $student->university_id,
            'student_id'     => $student->id,
            'institution_id' => $student->institution_id,
            'course_id'      => $student->course_id,
        ]);

        $studentContacts = $student->institutionContacts;

        if ($studentContacts) {

            foreach ($studentContacts as $studentContact) {

                $newLink->contacts()->create([
                    'type'                   => $studentContact->pivot->type,
                    'institution_contact_id' => $studentContact->pivot->institution_contact_id
                ]);

                $newLink->save();
            }

        }

        $links[] = $newLink->load('student');
    }

    return $links;
}

app\Emails\LinkEmailer.php

namespace App\Emails;

use App\Emails\EmailComposer;

class LinkEmailer
{
    protected $emailComposer;

    public function __construct(EmailComposer $emailComposer)
    {
        $this->emailComposer = $emailComposer;
    }

    public function send($links, $emailDetails, $emailType)
    {        
        $contactsAndLinks = $this->arrangeContactsToLinks($links);

        foreach ($contactsAndLinks as $linksAndContact) {

            $emailData = array_merge($linksAndContact, $emailDetails);

            // send/queue email
            \Mail::queue('emails/queued/reports', $emailData, function ($message) use ($emailData) {
                $message
                    ->to($emailData['email'], $emailData['formal_name'])
                    ->subject($emailData['subject']);
            });

            // compose email message, returns text of the email
            $emailMessage = $this->emailComposer->composeMessage($emailData);

            // // create Email
            $email = \App\Models\Email::create([
                'to'      => $emailData['email'],
                'from'    => '[email protected]',
                'subject' => $emailData['subject'],
                'body'    => $emailMessage,
                'type'    => $emailType,
                'user'    => $_SERVER['REMOTE_USER']
            ]);

            foreach ($linksAndContact['links'] as $link) {
                $link->emails()->attach($email->id);
            }
        }
    }

    // group links by contact
    public function arrangeContactsToLinks($links)
    {
        $contactsForLinks = [];
        $assigned         = false;
        $match            = false;

        foreach ($links as $link) { // 1, n

            if ($link->contacts) {

                foreach ($link->contacts as $contact) { // 1, 2

                    if ($contactsForLinks) {

                        $assigned = false;

                        foreach ($contactsForLinks as $key => $contactLink) { // n
                            // assign links to existing email in array
                            if ($contactLink['email'] === $contact->institutionContact->email) {
                                $match = false;

                                // check link hasn't already been included
                                foreach ($contactsForLinks[$key]['links'] as $assignedLink) {
                                    if ($assignedLink === $link) {
                                        $match = true;
                                    }
                                }

                                // if there was no match add to list of links
                                if (!$match) {
                                    $contactsForLinks[$key]['links'][] = $link->load('student');
                                    $assigned = true;
                                    break;
                                }
                            }
                        }

                        if (!$assigned) {
                            $contactsForLinks[] = [
                                'email'                 => $contact->institutionContact->email,
                                'formal_name'           => $contact->institutionContact->formal_name,
                                'requires_id'           => $contact->institutionContact->institution->requires_id,
                                'requires_course_title' => $contact->institutionContact->institution->requires_course_title,
                                'links'                 => [$link->load('student')],
                            ];
                        }
                    } else {
                        $contactsForLinks[] = [
                            'email'                 => $contact->institutionContact->email,
                            'formal_name'           => $contact->institutionContact->formal_name,
                            'requires_id'           => $contact->institutionContact->institution->requires_id,
                            'requires_course_title' => $contact->institutionContact->institution->requires_course_title,
                            'links'                 => [$link->load('student')],
                        ];
                    }
                }
            }
        }

        return $contactsForLinks;
    }
}

Edit 1

I've got this working now with set_time_limit(0); and ini_set('memory_limit','1056M'); it took 8 mins to do 3000 students.

Edit 2

I'm running Laravel Framework version 5.1.6 (LTS), MySQL for DB.

Edit 3

Appreciate all the answers so far, thank you all. I am thinking that I may work the link creation process into a queue which will have a related entity in the database called something like Batch and when that Batch of links has finished being created then group all the Links from that Batch and send them.

I could use the approach that @denis-mysenko suggested by having a sent_at field in the Links table and have a scheduled process to check for Links that haven't been sent and then send them. However, using the aforementioned approach I can send the Batch of Links when they've all finished being created, whereas with the sent_at approach with a scheduled process looking for Links that haven't been sent it could potentially send some links when all the links haven't been created yet.

like image 615
haakym Avatar asked Feb 26 '16 09:02

haakym


2 Answers

If you've tested your code with a small amount of data and it succeeds without crashing, it's clear that the problem is (as you said) the quite high number of records you're dealing with. Why don't you process your collection with the chunk method?

Accordingly to the Laravel docs:

If you need to process thousands of Eloquent records, use the chunk command. The chunk method will retrieve a "chunk" of Eloquent models, feeding them to a given Closure for processing. Using the chunk method will conserve memory when working with large result sets

Anyway, I think the use of a queue it's required in this kind of scenarios. I believe that working on a large set of data on a HTTP request should be absolutely avoided due to the high risk of request timeout. The queued process doesn't have the limit of execution time.

Why don't you use the queue AND the chunk method on your collection together? This will make you able to:

  • avoid timeout errors
  • use a reasonable amount of memory during the execution of the process (this depends on the number of data passed to the chunk closure)
  • give you priority control over this queued process and (eventually) other queued processes

The Laravel docs covers all you need: Eloquent - Retrieving multiple models (see the "Chunking results" chapter for going deeper on how to save memory when dealing with a large data set) and Queues for creating jobs and detach some parts of your software that should not run under your webserver, avoiding the risk of timeouts

like image 83
Andrea Alhena Avatar answered Oct 30 '22 00:10

Andrea Alhena


I would propose to change the architecture. I think it's unnecessarily overcomplicated.

Controller could like like:

public function storeAndSendMass(Request $request, LinkEmailer $linkEmailer)
{
    $this->validate($request, [
        'student_id' => 'required|array',
        'subject'    => 'required|max:255',
        'body'       => 'required|max:5000',
    ]);

    $students = $this->student
        ->with('institutionContacts')
        ->whereIn('id', $request->input('student_id'))
        ->where('is_active', 1)
        ->get();

    // Don't use Link.php method at all
    foreach ($students as $student) {
        $student->links()->create([
            'token'          => $student->id, // automatically hashed
            'status'         => 'active',
            'sent_at'        => null,
            'email_body'     => $request->input('body'),
            'email_subject'  => $request->input('subject')
        ]);
    }

    return response()->json([
        'message' => 'Creating and sending links'
    ]);
}

Why keep so many fields in Link model that already exist in Student model and are accessible via student() relationship? You could just keep the status and the token (I assume it's part of the link?), as well as 'sent_at' timestamp. If links are usually sent only once, it's reasonable to keep the email body and subject there as well.

If student updates his institution contacts, fresh data will be used at the time of email composition because you are not linking links to institution contacts explicitly.

Then, I would create a command (let's say newLinkNotifier) that would run, for instance, every 10 minutes and that would look for links that haven't been sent yet ($links->whereNull('sent_at')), group them by email ($link->student->institutionContacts) and email content ($link->email_body, $link->email_subject) and create queued email jobs. Then a queue worker would send those emails (or you could set queue to 'async' to send them right away from the command).

Since this command will run async, it doesn't really matter if it takes 5 minutes to finish. But in real life, it would take less than a minute for thousands and thousands of objects.

How to do the grouping? I would probably just rely on MySQL (if you are using it), it will do the job faster than PHP. Since all 3 fields are accessible from SQL tables (two directly, another from JOIN) - it's actually a pretty simple task.

In the end your emailers send() method will become as trivial as:

public function send()
{        
    $links = Link::whereNull('sent_at')->get();

    // In this example, we will group in PHP, not in MySQL
    $grouped = $links->groupBy(function ($item, $key) {
        return implode('.', [$item->email, $item->email_body, $item->email_subject]);
    });

    $grouped->toArray();

    foreach ($grouped as $group) {
        // We know all items in inside $group array got the same
        // email_body, email, email_subject values anyway!
        Mail::queue('emails/queued/reports', $group[0]->email_body, function ($message) use ($group) {
            $message
                ->to($group[0]->email)
                ->subject($group[0]->email_subject);
        });    
    }
}

This is not perfect yet, and I haven't tested this code - I wrote it right here, but it shows the proposed concept.

If you don't plan to get millions of entries - this is probably good enough. Otherwise, you could move link creation into a separate async job as well.

like image 2
Denis Mysenko Avatar answered Oct 30 '22 02:10

Denis Mysenko