Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best way to run automated task every minute when site is on multiple servers

I need to setup an automated task that runs every minute and sends emails in the queue. I'm using ASP.NET 4.5 and C#. Currently, I use a scheduler class that starts in the global.asax and makes use of caching and cache callback. I've read this leads to several problems.

The reason I did it that way is because this app runs on multiple load balanced servers and this allows me to have the execution in one place and the code will run even if one or more servers are offline.

I'm looking for some direction to make this better. I've read about Quartz.NET but never used it. Does Quartz.NET call methods from the application? or from a windows service? or from a web service?

I've also read about using a Windows service, but as far as I can tell, those are installed to the server direct. The thing is, I need the task to execute regardless of how many servers are online and don't want to duplicate it. For example, if I have a scheduled task setup on server 1 and server 2, they would both run together therefore duplicating the requests. However, if server 1 was offline, I need server 2 to run the task.

Any advice on how to move forward here or is the global.asax method the best way for the multi-server environment? BTW, the web servers are running Win Server 2012 with IIS 8.

EDIT

In a request for more information, the queue is stored in a database. I should also make mention that the database servers are separate from the web servers. There are two database servers, but only one runs at a time. There is a central storage they both read from so there is only one instance of the database. When one database server goes down, the other comes online.

That being said, would it make more sense to put a Windows Service deployed to both database servers? That would make sure only one runs at a time.

Also, what are your thoughts about running Quartz.NET from the application? As millimoose mentions, I don't necessarily need it running on the web front end, however, doing so allows me to not deploy a windows service to multiple machines and I don't think there would be a performance difference going either way. Thoughts?

Thanks everyone for the input so far. If any additional info is needed, please let me know.

like image 227
Ricketts Avatar asked Feb 05 '13 01:02

Ricketts


1 Answers

I have had to tackle the exact problem you're facing now.

First, you have to realize that you absolutely cannot reliably run a long-running process inside ASP.NET. If you instantiate your scheduler class from global.asax, you have no control over the lifetime of that class.

In other words, IIS may decide to recycle the worker process that hosts your class at any time. At best, this means your class will be destroyed (and there's nothing you can do about it). At worst, your class will be killed in the middle of doing work. Oops.

The appropriate way to run a long-lived process is by installing a Windows Service on the machine. I'd install the service on each web box, not on the database.

The Service instantiates the Quartz scheduler. This way, you know that your scheduler is guaranteed to continue running as long as the machine is up. When it's time for a job to run, Quartz simply calls a method on a IJob class that you specify.

class EmailSender : Quartz.IJob
{
    public void Execute(JobExecutionContext context)
    {
        // send your emails here
    }
}

Keep in mind that Quartz calls the Execute method on a separate thread, so you must be careful to be thread-safe.

Of course, you'll now have the same service running on multiple machines. While it sounds like you're concerned about this, you can actually leverage this into a positive thing!

What I did was add a "lock" column to my database. When a send job executes, it grabs a lock on specific emails in the queue by setting the lock column. For example, when the job executes, generate a guid and then:

UPDATE EmailQueue SET Lock=someGuid WHERE Lock IS NULL LIMIT 1;
SELECT * FROM EmailQueue WHERE Lock=someGuid;

In this way, you let the database server deal with the concurrency. The UPDATE query tells the DB to assign one email in the queue (that is currently unassigned) to the current instance. You then SELECT the the locked email and send it. Once sent, delete the email from the queue (or however you handle sent email), and repeat the process until the queue is empty.

Now you can scale in two directions:

  • By running the same job on multiple threads concurrently.
  • By virtue of the fact this is running on multiple machines, you're effectively load balancing your send work across all your servers.

Because of the locking mechanism, you can guarantee that each email in the queue gets sent only once, even though multiple threads on multiple machines are all running the same code.


In response to comments: There's a few differences in the implementation I ended up with.

First, my ASP application can notify the service that there are new emails in the queue. This means that I don't even have to run on a schedule, I can simply tell the service when to start work. However, this kind of notification mechanism is very difficult to get right in a distributed environment, so simply checking the queue every minute or so should be fine.

The interval you go with really depends on the time sensitivity of your email delivery. If emails need to be delivered ASAP, you might need to trigger every 30 seconds or even less. If it's not so urgent, you can check every 5 minutes. Quartz limits the number of jobs executing at once (configurable), and you can configure what should happen if a trigger is missed, so you don't have to worry about having hundreds of jobs backing up.

Second, I actually grab a lock on 5 emails at a time to reduce query load on the DB server. I deal with high volumes, so this helped efficiency (fewer network roundtrips between the service and the DB). The thing to watch out here is what happens if a node happens to go down (for whatever reason, from an Exception to the machine itself crashing) in the middle of sending a group of emails. You'll end up with "locked" rows in the DB and nothing servicing them. The larger the size of the group, the bigger this risk. Also, an idle node obviously can't work on anything if all remaining emails are locked.

As far as thread safety, I mean it in the general sense. Quartz maintains a thread pool, so you don't have to worry about actually managing the threads themselves.

You do have to be careful about what the code in your job accesses. As a rule of thumb, local variables should be fine. However, if you access anything outside the scope of your function, thread safety is a real concern. For example:

class EmailSender : IJob {
    static int counter = 0;

    public void Execute(JobExecutionContext context) {
        counter++; // BAD!
    }
}

This code is not thread-safe because multiple threads may try to access counter at the same time.

Thread A           Thread B
Execute()
                   Execute()
Get counter (0)
                   Get counter (0)
Increment (1)
                   Increment (1)
Store value
                   Store value

            counter = 1 

counter should be 2, but instead we have an extremely hard to debug race condition. Next time this code runs, it might happen this way:

Thread A           Thread B
Execute()
                   Execute()
Get counter (0)
Increment (1)
Store value
                   Get counter (1)
                   Increment (2)
                   Store value

            counter = 2

...and you're left scratching your head why it worked this time.

In your particular case, as long as you create a new database connection in each invocation of Execute and don't access any global data structures, you should be fine.

like image 125
josh3736 Avatar answered Oct 15 '22 11:10

josh3736