Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

java, quartz and multiple tasks triggered at certain times saved in a database

I'm building a system where users can set a future date(down to hours and minutes) in calendar. At that date a trigger is calling a certain task, unique for every user.

Every user can set a different date. The system will have 10k+ from the start and a user can create more than one trigger.

So assuming I have 10k users each user create on average 3 triggers => 30k triggers with 30k different dates.

All dates are saved in a database.

I'm new to quartz, can this be done in a more optimized way?

I was thinking about making a task run every minute that will get the tasks that will suppose to run in the next hour and remove them from database.

Do you have any better ideas? Did someone used quartz for a large number of triggers.

like image 640
Doua Beri Avatar asked Sep 14 '16 18:09

Doua Beri


1 Answers

You have the schedule backed in the database. If I understand the idea - you want the quartz to load all the upcoming tasks to execute them in the future.

This is problematic approach:

  1. Synchronization Issues: I assume that users can edit, remove and add new tasks to the database. You would have to periodically ask the database to refresh the state of the quartz jobs, remove some jobs, edit other jobs etc. This may not be trivial. The state of the program would be a long living cache which needs to be synchronised often.

  2. Performance and scalability issues: Even if proposed solution may be ok for 30K tasks it may not be ok for 70k or 700k tasks. In your approach it's not easy to scale - adding new machine would require additional layer of synchronisation - which machine should actually execute which job (as all of them have all the tasks).

What I would propose:

  • Add the "stage" to the Tasks table (new, queued, running, finished, failed)
  • divide your solution into several components. (Initially they can run on a single machine but it will be easy to scale)

Components:

  • Task Finder: Executed periodically (once every few seconds). Scans the database for tasks that are "new", and due soon. Sends the tasks found to Message Queue and marks the task as "queued" in the db. Marking as "queued" has to be done carefully as there can be multiple "task finders". (As an addition it may find the tasks that have been marked as "queued" or "running" more than N minutes ago and are not "finished" nor "canceled" - probably need to re-run these)

  • Message Queue: Connector between Taks Finder and Task Executor.

  • Task Executor: Listens to the Message Queue and process the tasks that it received. Marks the tasks as "running" initially and "finished" or "failed" later on.

With this approach you can have:

  • multiple Task Executors on multiple machines
  • multiple Task Schedulers on multiple machines
  • even if one of the Task Schedulers or Executors will fail it will not be Single Point of Failure. Some of the tasks will be delayed but it will be picked up and run afterwards.

This may not address all the scenarios but would be a good starting point.

like image 178
Piotr Reszke Avatar answered Nov 08 '22 05:11

Piotr Reszke