Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Quartz Performance

It seems there is a limit on the number of jobs that Quartz scheduler can run per second. In our scenario we are having about 20 jobs per second firing up for 24x7 and quartz worked well upto 10 jobs per second (with 100 quartz threads and 100 database connection pool size for a JDBC backed JobStore), however, when we increased it to 20 jobs per second, quartz became very very slow and its triggered jobs are very late compared to their actual scheduled time causing many many Misfires and eventually slowing down the overall performance of the system significantly. One interesting fact is that JobExecutionContext.getScheduledFireTime().getTime() for such delayed triggers comes to be 10-20 and even more minutes after their schedule time.

How many jobs the quartz scheduler can run per second without affecting the scheduled time of the jobs and what should be the optimum number of quartz threads for such load?

Or am I missing something here?

Details about what we want to achieve:

We have almost 10k items (categorized among 2 or more categories, in current case we have 2 categories) on which we need to some processing at given frequency e.g. 15,30,60... minutes and these items should be processed within that frequency with a given throttle per minute. e.g. lets say for 60 minutes frequency 5k items for each category should be processed with a throttle of 500 items per minute. So, ideally these items should be processed within first 10 (5000/500) minutes of each hour of the day with each minute having 500 items to be processed which are distributed evenly across the each second of the minute so we would have around 8-9 items per second for one category.

Now for to achieve this we have used Quartz as scheduler which triggers jobs for processing these items. However, we don't process each item with in the Job.execute method because it would take 5-50 seconds (averaging to 30 seconds) per item processing which involves webservice call. We rather push a message for each item processing on JMS queue and separate server machines process those jobs. I have noticed the time being taken by the Job.execute method not to be more than 30 milliseconds.

Server Details:

Solaris Sparc 64 Bit server with 8/16 cores/threads cpu for scheduler with 16GB RAM and we have two such machines in the scheduler cluster.

like image 379
vikas Avatar asked Jul 19 '12 17:07

vikas


People also ask

How many jobs can quartz handle?

The actual number of jobs that can be running at any moment in time is limited by the size of the thread pool. If there are five threads in the pool, no more than five jobs can run at a time.

What is quartz misfire?

A misfire occurs if a persistent trigger “misses” its firing time because of the scheduler being shutdown, or because there are no available threads in Quartz's thread pool for executing the job. The different trigger types have different misfire instructions available to them.

How does quartz clustering work?

Quartz's clustering features bring both high availability and scalability to your scheduler via fail-over and load balancing functionality. Clustering currently only works with the JDBC-Jobstore (JobStoreTX or JobStoreCMT), and essentially works by having each node of the cluster share the same database.

How does quartz trigger work?

Quartz scheduler allows an enterprise to schedule a job at a specified date and time. It allows us to perform the operations to schedule or unschedule the jobs. It provides operations to start or stop or pause the scheduler. It also provides reminder services.


2 Answers

In a previous project, I was confronted with the same problem. In our case, Quartz performed good up a granularity of a second. Sub-second scheduling was a stretch and as you are observing, misfires happened often and the system became unreliable.

Solved this issue by creating 2 levels of scheduling: Quartz would schedule a job 'set' of n consecutive jobs. With a clustered Quartz, this means that a given server in the system would get this job 'set' to execute. The n tasks in the set are then taken in by a "micro-scheduler": basically a timing facility that used the native JDK API to further time the jobs up to the 10ms granularity.

To handle the individual jobs, we used a master-worker design, where the master was taking care of the scheduled delivery (throttling) of the jobs to a multi-threaded pool of workers.

If I had to do this again today, I'd rely on a ScheduledThreadPoolExecutor to manage the 'micro-scheduling'. For your case, it would look something like this:

ScheduledThreadPoolExecutor scheduledExecutor;
...
    scheduledExecutor = new ScheduledThreadPoolExecutor(THREAD_POOL_SIZE);
...

// Evenly spread the execution of a set of tasks over a period of time
public void schedule(Set<Task> taskSet, long timePeriod, TimeUnit timeUnit) {
    if (taskSet.isEmpty()) return; // or indicate some failure ...
    long period = TimeUnit.MILLISECOND.convert(timePeriod, timeUnit);
    long delay = period/taskSet.size();
    long accumulativeDelay = 0;
    for (Task task:taskSet) {
        scheduledExecutor.schedule(task, accumulativeDelay, TimeUnit.MILLISECOND);
        accumulativeDelay += delay;
    }
}

This gives you a general idea on how use the JDK facility to micro-schedule tasks. (Disclaimer: You need to make this robust for a prod environment, like check failing tasks, manage retries (if supported), etc...).

With some testing + tuning, we found an optimal balance between the Quartz jobs and the amount of jobs in one scheduled set.

We experienced a 100X throughput improvement in this way. Network bandwidth was our actual limit.

like image 146
maasg Avatar answered Oct 18 '22 14:10

maasg


First of all check How do I improve the performance of JDBC-JobStore? in Quartz documentation.

As you can probably guess there is in absolute value and definite metric. It all depends on your setup. However here are few hints:

  • 20 jobs per second means around 100 database queries per second, including updates and locking. That's quite a lot!

  • Consider distributing your Quartz setup to cluster. However if database is a bottleneck, it won't help you. Maybe TerracottaJobStore will come to the rescue?

  • Having K cores in the system everything less than K will underutilize your system. If your jobs are CPU intensive, K is fine. If they are calling external web services, blocking or sleeping, consider much bigger values. However more than 100-200 threads will significantly slow down your system due to context switching.

  • Have you tried profiling? What is your machine doing most of the time? Can you post thread dump? I suspect poor database performance rather than CPU, but it depends on your use case.

like image 21
Tomasz Nurkiewicz Avatar answered Oct 18 '22 13:10

Tomasz Nurkiewicz