Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Distributed Job scheduling, management, and reporting

I recently had a play around with Hadoop and was impressed with it's scheduling, management, and reporting of MapReduce jobs. It appears to make the distribution and execution of new jobs quite seamless, allowing the developer to concentrate on the implementation of their jobs.

I am wondering if anything exists in the Java domain for the distributed execution of jobs that are not easily expressed as MapReduce problems? For example:

  • Jobs that require task co-ordination and synchronization. For example, they may involve sequential execution of tasks yet it is feasible to execute some tasks concurrently:

                   .-- B --.
            .--A --|       |--.
            |      '-- C --'  |
    Start --|                 |-- Done
            |                 |
            '--D -------------'
    
  • CPU intensive tasks that you'd like to distribute but don't provide any outputs to reduce - image conversion/resizing for example.

So is there a Java framework/platform that provides such a distributed computing environment? Or is this sort of thing acceptable/achievable using Hadoop - and if so are there any patterns/guidelines for these sorts of jobs?

like image 857
teabot Avatar asked Dec 16 '09 14:12

teabot


People also ask

What is distributed process scheduling?

Distributed Scheduling refers to the chaining of different jobs into a coordinated workflow that spans several computers. For example, you schedule a processing job on machine1 and machine2 , and when these are finished you need to schedule a job on machine3 . This is distributed scheduling.

How does a distributed scheduler work?

For example, a distributed scheduler can be installed on one or more machines, through which a user can schedule tasks to run on servers A, B, C, and D. The user can chain these tasks together into a single job, so that a successful execution of server A tasks will trigger tasks to run on server B, and so on.

What is the process of job scheduling?

Job scheduling is the process where different tasks get executed at pre-determined time or when the right event happens. A job scheduler is a system that can be integrated with other software systems for the purpose of executing or notifying other software components when a pre-determined, scheduled time arrives.

What is Chronos AWS?

Chronos is a distributed execution system meant to replace cron. It is also fault tolerant and lives on top of Mesos, the Apache cluster manager. With Chronos, you can schedule a pipeline of tasks across your entire infrastructure, wherever it may live.


1 Answers

I have since found Spring Batch and Spring Batch Integration which appear to address many of my requirements. I will let you know how I get on.

like image 133
teabot Avatar answered Nov 04 '22 11:11

teabot