Running jobs parallely in hadoop

3 Answers

Hadoop can be configured with a number of schedulers and the default is the FIFO scheduler.

FIFO Schedule behaves like this.

Scenario 1: If the cluster has 10 Map Task capacity and job1 needs 15 Map Task, then running job1 takes the complete cluster. As job1 makes progress and there are free slots available which are not used by job1 then job2 runs on the cluster.

Scenario 2: If the cluster has 10 Map Task capacity and job1 needs 6 Map Task, then job1 takes 6 slots and job2 takes 4 slots. job1 and job2 run in parallel.

To run jobs in parallel from the start, you can either configure a Fair Scheduler or a Capacity Scheduler based on your requirements. The mapreduce.jobtracker.taskscheduler and the specific scheduler parameters have to be set for this to take effect in the mapred-site.xml.

Edit: Updated the answer based on the comment from MRK.

answered Oct 14 '22 23:10

Praveen Sripati

You have "Map Task Capacity" and "Reduce Task Capacity". Whenever those are free they would pick the job in FIFO order. Your submitted jobs contains mapper and optionally reducer. If your jobs mapper (and/or reducer) count is smaller then the cluster's capacity it would take the next jobs mapper (and/or reducer).

If you don't like FIFO, you can always give priority to your submitted jobs.

Edit:

Sorry about slight missinformation, Praveen's answer is the right one. in adition to his answer you can check HOD scheduler aswell.

answered Oct 15 '22 00:10

frail

With the default scheduler only one job per user at a time. You can launch different jobs from different user ids. They will run in parallel, of course, as mentioned by others you need to have enough slot capacity.

answered Oct 15 '22 00:10

kiru

Related questions
                            
                                Apache Hadoop setXIncludeAware UnsupportedOperationException
                            
                                IOException: Filesystem closed exception when running oozie workflow
                            
                                Java: com.sun.tools.javac.Main not found when trying to compile Hadoop program
                            
                                Differences between Hadoop-common, Hadoop-core and Hadoop-client?
                            
                                overwrite hive partitions using spark
                            
                                Global variables in hadoop
                            
                                A way to export the results from Pig to a database
                            
                                Find the average of numbers using MapReduce
                            
                                How to use Hadoop InputFormats In Apache Spark?
                            
                                Hadoop MapReduce: Clarification on number of reducers
                            
                                What is the difference between hadoop job -kill job_id and yarn application -kill application_id
                            
                                localhost: ERROR: Cannot set priority of datanode process 32156
                            
                                Hadoop on Kubernetes vs Standard Hadoop
                            
                                java.io.IOException: Incompatible clusterIDs
                            
                                how to order my tuple of spark results descending order using value
                            
                                Setting YARN queue in PySpark
                            
                                CAP with distributed System
                            
                                How to copy first few lines of a large file in hadoop to a new file?
                            
                                Could you give me any clue Why 'Cannot call methods on a stopped SparkContext'?
                            
                                How to find Hadoop hdfs directory on my system?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Running jobs parallely in hadoop

Tags:

hadoop

MRK

People also ask

3 Answers

Praveen Sripati

frail

kiru

Recent Activity

Donate For Us